CloudABI purely operates on file descriptor rights (CAP_*). File
descriptor access modes (O_ACCMODE) are emulated on top of rights.
Instead of accepting the traditional flags argument, file_open() copies
in an fdstat_t object that contains the initial rights the descriptor
should have, but also file descriptor flags that should persist after
opening (APPEND, NONBLOCK, *SYNC). Only flags that don't persist (EXCL,
TRUNC, CREAT, DIRECTORY) are passed in as an argument.
file_open() first converts the rights, the persistent flags and the
non-persistent flags to fflags. It then calls into vn_open(). If
successful, it installs the file descriptor with the requested
rights, trimming off rights that don't apply to the type of
the file that has been opened.
Unlike kern_openat(), this function does not support /dev/fd/*. I can't
think of a reason why we need to support this for CloudABI.
Obtained from: https://github.com/NuxiNL/freebsd
Differential Revision: https://reviews.freebsd.org/D3235
BROKEN NFS SERVER OR MIDDLEWARE: Certain WAN "accelerators" attempt to cache
NFS GETATTR traffic, but actually corrupt it (e.g., responding to requests
with attributes for totally different files).
Warn very verbosely when this is detected. Linux' NFS client has a similar
warning.
Adds a sysctl/tunable (vfs.nfs.fileid_maxwarnings) to configure the quantity
of warnings; default to 10. (Zero disables; -1 is unlimited.)
Adds a failpoint to aid in validating the warning / behavior with a
non-broken server. Use something like:
sysctl 'debug.fail_point.nfscl_force_fileid_warning=10%return(1)'
Reviewed by: rmacklem
Approved by: markj (mentor)
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3304
At this point IMMED flag is translated to MNT_NOWAIT flag of VOP_FSYNC(),
hoping that file system implements that (ZFS seems doesn't).
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Before this change SYNCHRONIZE CACHE commands were executed exclusively,
as if they had ORDERED tag. But looking through SCSI specs I've found
no any reason to be so strict. For reads this ordering seems pointless.
For writes it looks less obvious, so I left ordering against preceeding
write commands, while following ones are no longer required to wait.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
is detaching.
This mostly fixes a panic - the reset path shouldn't run whilst
the NIC is being torn down.
It's not locked, so it's "mostly" ok, but most of the rest of
the driver doesn't read sc->invalid with sensible locking. Grr.
The real solution is to cleanly tear down taskqueues in the detach/suspend
phase, but ..
changes in the firmwares since 1.11.27.0 are listed here (straight copy-paste
from the "Release Notes.txt" accompanying the Chelsio Unified Wire 2.11.1.0
release on the website).
22.1. T5 Firmware
+++++++++++++++++++++++++++++++++
Version : 1.14.4.0
Date : 08/05/2015
================================================================================
FIXES
-----
BASE:
- Fixes a potential data path hang by properly programming PMTX congestion
threshold settings.
- Fixes a potential initialization error when accessing a configuration file
stored on the flash.
- Fixes a regression where SGE resources can be miss-sized if iWARP is disabled.
ETH:
- Fixes a timing issue that would prevent CR4 links from coming up with some
switches.
FOFCoE:
- Defers fcoe linkdown mailbox command handling till LOGO is sent.
- Updates vlan prio for all outstanding IOs during dcbx update.
ENHANCEMENTS
------------
BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.
ETH:
- Enhances segmentation offload to include VxLAN and Geneve.
- Adds PTP support.
- Adds new interface to allow the driver to query the VI rss table base
addresses.
- Allows the driver to program the SGE ingrext contxt CongDrop field.
OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
and rcv scale factors.
iSCSI:
- Adds support for iscsi segmentatation offload (ISO).
- Adds support for iscsi t10-dif offload.
FOiSCSI:
- Sets FORCE_BIT for cut through processing for FOiSCSI.
FOFCoE:
- Adds support for FCoE BB6.
- Improves WRITE performance.
================================================================================
================================================================================
Version : 1.13.32.0
Date : 03/25/2015
================================================================================
FIXES
-----
BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
adapter configurations)
- Fixes config file based PL_TIMEOUT register programming
ETH:
- Fixes a potential EO UDP SEG header corruption
- Fixes an issue where 1000Base-X was not enabled correctly when using QSA
modules
OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1
FOFCoE:
- Fixes fcoe xchg leaks in linkdown/peer down path
- Fixes cleanup in FCoE linkdown and fixed buf timer flowid abuse
- Fixes fw crash by clearing fcf flowc during bye
FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.
ENHANCEMENTS
------------
BASE:
- Adds support for VFs on PFs 4 to 7
- Adds support for QPs/CQs on any physical and virtual function
ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE
- Adds support for CR4 links (BEAN/AEC on 40G TwinAx cables)
OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software
FOISCSI:
- Adds IPv6 support for foiscsi. Keeps backward compatibility with
old foiscsi drivers which doesn't support ipv6.
FOFCoE:
- Added fcoe debug support in flowc dump
================================================================================
================================================================================
Version : 1.12.25.0
Date : 10/22/2014
================================================================================
FIXES
-----
BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Fixes an issue where the link would intermittently fail to come up
- Fixes an issue where adapters with an external PHY couldn't run at 100Mbps
- Fixes an issue where active optical cables were not recognized
- Fixes link advertising issues on T520-BT (speed and pause frames) that would
cause the link to negotiate unexpected settings
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested
ETH:
- Fixes NVGRE Segmentation Offload network header generation.
DCBX:
- Fixes an issue where some settings were not being sent to the switch
correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
FW
- Fixes a firmware crash on DCBX APP information request before link up
FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT
ENHANCEMENTS
-------------
BASE:
- Adds link partner settings reporting when available
- Adds QSA support (in conjunction with QSA VPD)
- Adds T520-BT LED support
- Reports NOTSUPPORTED for modules with an unhandled identifier
DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs
FOiSCSI:
- Add support for multiple iSCSI DDP client
- Sends DHCP renew request when lease expires
================================================================================
22.2. T4 Firmware
+++++++++++++++++
Version : 1.14.4.0
Date : 08/05/2015
================================================================================
FIXES
-----
BASE:
- Fixes a potential initialization error when accessing a configuration file
stored on the flash.
- Initialize PCIE_DBG_INDIR_REQ.Enable to 0, as hardware failed to do so and
register dumps could result in errors.
ETH:
- Fixes an issue that sometimes prevented the link from coming up in CR adapters.
ENHANCEMENTS
------------
BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.
ETH:
- Adds new interface to allow the driver to query the VI rss table base
addresses.
OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
and rcv scale factors.
================================================================================
================================================================================
Version : 1.13.32.0
Date : 03/25/2015
================================================================================
FIXES
-----
BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
adapter configurations)
- Fixes config file based PL_TIMEOUT register programming
ETH:
- Fixes a potential EO UDP SEG header corruption
OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1
FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.
ENHANCEMENTS
------------
ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE
OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software
================================================================================
================================================================================
Version : 1.12.25.0
Date : 10/22/2014
================================================================================
FIXES
-----
BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested
DCBX:
- Fixes an issue where some settings were not being sent to the switch
correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
FW
- Fixes a firmware crash on DCBX APP information request before link up
FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT
ENHANCEMENTS
------------
BASE:
- Adds link partner settings reporting when available
- Firmware now reports NOTSUPPORTED for modules with an unhandled identifier
DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs
FOiSCSI:
- Adds support for multiple iSCSI DDP clients
- Sends DHCP renew request when lease expires
================================================================================
Obtained from: Chelsio Communications
MFC after: 2 weeks
Sponsored by: Chelsio Communications
This is required for (more) correct TDMA support. Without it, the
code tries to calculate the required guard interval based on the
current rate, and since this is an 11n NIC and people try using
11n, it calls ath_hal_computetxtime() on an 11n rate which then
panics.
This doesn't fix TDMA slave mode on AR9300 - it just makes it
have one less bug.
Reported by: Berislav Purgar <bpurgar@gmail.com>
It looks like a MODULE_VERSION() can also appear on its own -- there is
no need to use explicitly use DECLARE_MODULE(). Looking at other
modules, this seems common practice.
This kernel module does not require any explicit initialization, but a
module declaration is needed to let the "cloudabi64" kernel module
automatically pull this in.
Obtained from: https://github.com/NuxiNL/freebsd
The stat_put() system call can be used to modify file descriptor
attributes, such as flags, but also Capsicum permission bits. Support
for changing Capsicum bits will be added as soon as its dependent
changes have been pushed through code review.
Obtained from: https://github.com/NuxiNL/freebsd
During vMotion and Clone VMware by default runs multiple sequential 4MB
XCOPY requests same time. If CTL issues reads sequentially in 1MB chunks
for each XCOPY command, reads from different commands are not detected
as sequential by serseq option code and allowed to execute simultaneously.
Such read pattern confused ZFS prefetcher, causing suboptimal disk access.
Issuing all reads same time make serseq code work properly, serializing
reads both within each XCOPY command and between them.
My tests with ZFS pool of 14 disks in RAID10 shows prefetcher efficiency
improved from 37% to 99.7%, copying speed improved by 10-60%, average
read latency reduced twice on HDD layer and by five times on zvol layer.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
CloudABI uses a structure called cloudabi_sockstat_t. Think of it as
'struct stat' for sockets. It is used by functions such as
getsockname(), getpeername(), some of the getsockopt() values, etc.
This change implements the sock_stat_get() system call that returns a
copy of this structure. The accept() system call should also return a
full copy of this structure eventually, but for now we're only
interested in the peer address. Add a TODO() to make sure this is
patched up later on.
Differential Revision: https://reviews.freebsd.org/D3218
On CloudABI we want to create file descriptors with just the minimal set
of Capsicum rights in place. The reason for this is that it makes it
easier to obtain uniform behaviour across different operating systems.
By explicitly whitelisting the operations, we can return consistent
error codes, but also prevent applications from depending OS-specific
behaviour.
Extend kern_kqueue() to take an additional struct filecaps that is
passed on to falloc_caps(). Update the existing consumers to pass in
NULL.
Differential Revision: https://reviews.freebsd.org/D3259
It looks like EVFILT_READ and EVFILT_WRITE trigger under the same
conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX.
The only difference is that POLLRDNORM has to be triggered on regular
files unconditionally, whereas EVFILT_READ only triggers when not EOF.
Introduce a new flag, NOTE_FILE_POLL, that can be used to make
EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag
will be used by cloudlibc's poll() function.
Reviewed by: jmg
Differential Revision: https://reviews.freebsd.org/D3303
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)
These will create and destroy a temporary, CPU-local KVA mapping of a specified page.
Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread
Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page
The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.
NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.
Reviewed by: kib
Approved by: kib (mentor)
Differential Revision: http://reviews.freebsd.org/D3013
This config is already building all modules, so we don't need the
MODULES_EXTRA definition. It was also causing problems to users who
rely on MODULES_OVERRIDE to do the right thing.
Discussed with: ian
defines the keys differently than NIST does, so we have to muck with
key lengths and nonce/IVs to be standard compliant...
Remove the iv from secasvar as it was unused...
Add a counter protected by a mutex to ensure that the counter for GCM
and ICM will never be repeated.. This is a requirement for security..
I would use atomics, but we don't have a 64bit one on all platforms..
Fix a bug where IPsec was depending upon the OCF to ensure that the
blocksize was always at least 4 bytes to maintain alignment... Move
this logic into IPsec so changes to OCF won't break IPsec...
In one place, espx was always non-NULL, so don't test that it's
non-NULL before doing work..
minor style cleanups...
drop setting key and klen as they were not used...
Enforce that OCF won't pass invalid key lengths to AES that would
panic the machine...
This was has been tested by others too... I tested this against
NetBSD 6.1.5 using mini-test suite in
https://github.com/jmgurney/ipseccfgs and the only things that don't
pass are keyed md5 and sha1, and 3des-deriv (setkey syntax error),
all other modes listed in setkey's man page... The nice thing is
that NetBSD uses setkey, so same config files were used on both...
Reviewed by: gnn
overflows the stack during root mount in some configurations.
Tested by: Fabian Keil <freebsd-listen@fabiankeil.de> (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
of the timehands, from the kern_tc.c implementation to vdso. Add
comments giving hints where to look for the algorithm explanation.
To compensate the removal of rmb() in userspace binuptime(), add
explicit lfence instruction before rdtsc. On i386, add usual
complications to detect SSE2 presence; assume that old CPUs which do
not implement SSE2 also execute rdtsc almost in order.
Reviewed by: alc, bde (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
It looks like umtx_key_get() has the addition and subtraction the wrong
way around, meaning that it fails to match in certain cases. This causes
the cloudlibc unit tests to deadlock in certain cases.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3287
The filter is called from the network hot path and must not sleep.
The filter runs with the descriptor lock held and does not manipulates the
buffers, so it is not necessary sleep when the hold buffer is in use.
Just ignore the hold buffer contents when it is being copied to user space
(when hold buffer in use is set).
This fix the "Sleeping thread owns a non-sleepable lock" panic when the
userland thread is too busy reading the packets from bpf(4).
PR: 200323
MFC after: 2 weeks
Sponsored by: Rubicon Communications (Netgate)
a pcb from stoppcbs[] rather than the thread's PCB. However, exited threads
retained td_oncpu from the last time they ran, and newborn threads had their
CPU fields cleared to zero during fork and thread creation since they are
in the set of fields zeroed when threads are setup. To fix, explicitly
update the CPU fields for exiting threads in sched_throw() to reflect the
switch out and reset the CPU fields for new threads in sched_fork_thread()
to NOCPU.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3193
which constitute the majority of the pages that are processed by
vm_fault_dontneed(), are already near the tail of the inactive queue. Only
the pages at faulting virtual addresses are actually moved by
vm_page_advise(..., MADV_DONTNEED). However, vm_page_advise(...,
MADV_DONTNEED) is simultaneously too aggressive and passive for the moved
pages. It makes most of these pages too easily reclaimable, and at the same
time it leaves enough pages in the active queue to trigger pageouts by the
page daemon. Instead, with this change, the pages at faulting virtual
addresses are moved to the tail of the inactive queue, where they are
relatively close to the pages prefetched by the same page fault.
Discussed with: jeff
Sponsored by: EMC / Isilon Storage Division
as it is now, devfs ignores MNT_FORCE anyway, so it needs to be modified
to trigger the panic) with consumers still opened.
Note that this still results in a leak of r/w/e counters. It seems
to be harmless, though. If anyone knows a better way to approach
this - please tell.
Discussed with: kib@, mav@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3050
For example, without this patch, the following three lines
in /boot/loader.conf would result in /boot/root.img being preloaded
twice, and two md(4) devices - md0 and md1 - being created.
initmd_load="YES"
initmd_type="md_image"
initmd_name="/boot/root.img"
Reviewed by: marcel@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3204
When doing a data abort from userland it is possible to get
more than one data abort inside the same exception level.
Add an appropriate exception number to allow nesting of
data_abort handler for EL0.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3276
CloudABI processes should run in capabilities mode automatically. There
is no need to switch manually (e.g., by calling cap_enter()). Add a
flag, SV_CAPSICUM, that can be used to call into cap_enter() during
execve().
Reviewed by: kib
reported, on APs. We already did this on BSP.
Otherwise, the userspace software which depends on the features
reported by the high CPUID levels is misbehaving. In particular, AVX
detection is non-functional, depending on which CPU thread happens to
execute when doing CPUID. Another victim is the libthr signal
handlers interposer, which needs to save full FPU extended state.
Reported and tested by: Andre Meiser <ortadur@web.de>
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
- The existing TCP INP_INFO lock continues to protect the global inpcb list
stability during full list traversal (e.g. tcp_pcblist()).
- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
and free) and inpcb global counters.
It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.
PR: 183659
Differential Revision: https://reviews.freebsd.org/D2599
Reviewed by: jhb, adrian
Tested by: adrian, nitroboost-gmail.com
Sponsored by: Verisign, Inc.
hangs on shutdown with LUNs with mounted filesystems over a disconnected
iSCSI session.
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3052
in savectx where it will be used to store the current state however will
pass in a pcb when vfp_save_state expected a thread pointer.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
The check introduced by r285946 failed to add the dependency on
opt_kstack_pages.h which meant the default value for the platform instead
of the customised options KSTACK_PAGES=X was being tested.
Also wrap in #ifdef __FreeBSD__ for portability.
MFC after: 3 days
Sponsored by: Multiplay
Certain system calls have quirks applied to make them work as if called
on an older version of FreeBSD. As CloudABI executables don't have the
FreeBSD OS release number in the ELF header, this value is set to zero,
making the system calls fall back to typically historic, non-standard
behaviour.
Reviewed by: kib
to double check for if the card has probed before. In fact, there's no
reason to single check either. Simplify the code as a result.
$FreeBSD$ added to lxutil.c in a non-standard way to help keep the
diffs with upstream to a minimum.
Differential Revision: https://reviews.freebsd.org/D3263
o remove disabled code;
o if nexthop address is link-local, use embedded scope zone id to
determine outgoing interface;
o properly fill ro_dst before doing route lookup;
o remove LLE lookup, instead check rt_flags for RTF_GATEWAY bit.
Sponsored by: Yandex LLC
When a process is exiting, there is a narrow window where p_cred may be
NULL while its threads are still executing. Specifically, the last thread
to exit a process sets the process state to PRS_ZOMBIE with the proc
spinlock held and then calls thread_exit(). thread_exit() drops the spin
lock, permitting the process to be reaped and thus causing its cred struct
to be released. However, the exiting thread may still cause DTrace probes
to fire by calling sched_throw(), resulting in a double fault if such a
probe enabling attempts to access the GID or UID DIF variables.
The thread's cred reference is not susceptible to this race since it is not
released until after the thread has exited.
MFC after: 1 week
This field is only used in a KASSERT that verifies that no locks are held
when returning to user mode. Moreover, the td_locks accounting is only
correct when LOCK_DEBUG > 0, which is implied by INVARIANTS.
Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3205
Some of FDT blobs for AM335x-based devices use pinmux pullup/pulldown
flag to setup initial GPIO ouputp value, e.g. 4DCAPE-43 sets LCD DATAEN
signal this way. It works for Linux because Linux driver does not enforce
pin direction until after it's requested by consumer. So input with pullup
flag set acts as output with GPIO_HIGH value
Reviewed by: loos
original parent. Otherwise the debugee will be set as an orphan of
the debugger.
Add tests for tracing forks via PT_FOLLOW_FORK.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D2809
Summary:
Use the newly created `kern_shm_open()` function to create objects with
just the rights that are actually needed.
Reviewers: jhb, kib
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3260
This allows you to specify the capabilities that the new file descriptor
should have. This allows us to create shared memory objects that only
have the rights we're interested in.
The idea behind restricting the rights is that it makes it a lot easier
for CloudABI to get consistent behaviour across different operating
systems. We only need to make sure that a shared memory implementation
consistently implements the operations that are whitelisted.
Approved by: kib
Obtained from: https://github.com/NuxiNL/freebsd
The first one never triggers because bpf_canfreebuf() can only be true for
zero-copy buffers and zero-copy buffers are not read with read(2).
The second also never triggers, because we check the free buffer before
calling ROTATE_BUFFERS(). If the hold buffer is in use the free buffer
will be NULL and there is nothing else to do besides drop the packet. If
the free buffer isn't NULL the hold buffer _is_ free and it is safe to
rotate the buffers.
Update the comment in ROTATE_BUFFERS macro to match the logic described
here.
While here fix a few typos in comments.
MFC after: 2 weeks
Sponsored by: Rubicon Communications (Netgate)
The buffer must be allocated (or even changed) before the interface is set
and thus, there is no need to verify if the buffer is in use.
MFC after: 2 weeks
Sponsored by: Rubicon Communications (Netgate)
the buffer is allocated we are committed to a particular buffer method
(BPF_BUFMODE_BUFFER in this case).
If we are using zero-copy buffers, the userland program must register its
buffers before set the interface.
If we are using kernel memory buffers, we can allocate the buffer at the
time that the interface is being set.
This fix allows the usage of BIOCSETBUFMODE after r235746.
Update the comments to reflect the recent changes.
MFC after: 2 weeks
Sponsored by: Rubicon Communications (Netgate)
unwind_frame function to read each stack frame until either the pc or stack
are no longer withing the kernel's address space.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
This is copied from the amd64 version with minor changes. These should be
merged into a single file as from a quick look there are other copies of
the same file in other parts of the tree.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
when trying to read data from outside the DMAP region. I expect this panic
to be from within uiomove_fromphys, which needs to grow support to support
such addresses.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
On CloudABI, the rights bits returned by cap_rights_get() match up with
the operations that you can actually perform on the file descriptor.
Limiting the rights is good, because it makes it easier to get uniform
behaviour across different operating systems. If process descriptors on
FreeBSD would suddenly gain support for any new file operation, this
wouldn't become exposed to CloudABI processes without first extending
the rights.
Extend fork1() to gain a 'struct filecaps' argument that allows you to
construct process descriptors with custom rights. Use this in
cloudabi_sys_proc_fork() to limit the rights to just fstat() and
pdwait().
Obtained from: https://github.com/NuxiNL/freebsd
still on going, but it has passed world for mips and powerpc...
I know this has an extra semicolon, but this is the patch that is
tested...
Looks like better fix is to use _Static_assert...
has observable overhead when the buffer pages are not resident or not
mapped. The overhead comes at least from two factors, one is the
additional work needed to detect the situation, prepare and execute
the rollbacks. Another is the consequence of the i/o splitting into
the batches of the held pages, causing filesystems see series of the
smaller i/o requests instead of the single large request.
Note that expected case of the resident i/o buffer does not expose
these issues. Provide a prefaulting for the userspace i/o buffers,
disabled by default. I am careful of not enabling prefaulting by
default for now, since it would be detrimental for the applications
which speculatively pass extra-large buffers of anonymous memory to
not deal with buffer sizing (if such apps exist).
Found and tested by: bde, emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
use CTASSERTs now that we have them...
Replace a draft w/ RFC that's over 10 years old.
Note that _AALG and _EALG do not need to match what the IKE daemons
think they should be.. This is part of the KABI... I decided to
renumber AESCTR, but since we've never had working AESCTR mode, I'm
not really breaking anything.. and it shortens a loop by quite
a bit..
remove SKIPJACK IPsec support... SKIPJACK never made it out of draft
(in 1999), only has 80bit key, NIST recommended it stop being used
after 2010, and setkey nor any of the IKE daemons I checked supported
it...
jmgurney/ipsecgcm: a357a33, c75808b, e008669, b27b6d6
Reviewed by: gnn (earlier version)
The IPsec SA statistic keeping is used even for decision making on expiry/rekeying SAs.
When there are multiple transformations being done the statistic keeping might be wrong.
This mostly impacts multiple encapsulations on IPsec since the usual scenario it is not noticed due to the code path not taken.
Differential Revision: https://reviews.freebsd.org/D3239
Reviewed by: ae, gnn
Approved by: gnn(mentor)
Right now there is a chance that sysctl unregister will cause reader to
block on the sx lock associated with sysctl rmlock, in which case kernels
with debug enabled will panic.
Brightness is controlled through sysctl dev.gpiobacklight.X.brightness:
- any value greater than 0: backlight is on
- any value less than or equal to 0: backlight is off
FDT bindings docs in Linux tree:
Documentation/devicetree/bindings/video/backlight/gpio-backlight.txt
In tf_dequeue(), if we reach the end of the list without finding a
non-cancelled element, "tmp" will be a pointer into the list head, so the
tmp->canceled check is bogus. Use a flag instead.
Submitted by: Tao Liu <Tao.Liu@isilon.com>
Reviewed by: hselasky
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3244
the VM_FAULT_CHANGE_WIRING flag to VM_FAULT_WIRE. Assert that the
flag is only passed when faulting on the wired map entry. Remove the
vm_page_unwire() call, which should be never reachable.
Since VM_FAULT_WIRE flag implies wired map entry, the TRYPAGER() macro
is reduced to the testing of the fs.object having a default pager.
Inline the check.
Suggested and reviewed by: alc
Tested by: pho (previous version)
MFC after: 1 week
The check added in r285872 can trigger for valid buffers if the buffer space
used happens to be just after unmapped_buf in KVA space.
Discussed with: kib
Sponsored by: Citrix Systems R&D
FreeBSD provides a feature called Adaptive Mutexes, which allows
a thread to spin for a while when the mutex is taken instead of
immediately going to sleep. This causes issues when called from
syscall handler if interrupts are masked. If every other core
also attempts to access the same mutex there is a chance that
all of them are spinning on the same lock at the same time.
If interrupts are disabled, no kernel preemtion can occur and
the system becomes unresponsive.
This patch enables interrupts when syscall is being executed
and masks them as soon as it is completed.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3246
This is a clean-up patch from a serie delivering support for
Annapurna Labs Alpine PoC.
The HAL files have already been added to sys/contrib/alpine-hal
so there is no need for them in the platform directory.
This patch removes obsolete files.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Annapurna Labs
Differential Revision: https://reviews.freebsd.org/D3248
the blkfront driver to perform I/Os of up to 2 MB, subject to support from
the blkback to which it is connected and the initiation of such large I/Os
by the rest of the kernel. In practice, the I/O size is increased from 40 kB
to 128 kB.
The changes to xen/interface/io/blkif.h consist merely of merging updates
from the upstream Xen repository.
In dev/xen/blkfront/block.h we add some convenience macros and structure
fields used for indirect-page I/Os: The device records its negotiated limit
on the number of indirect pages used, while each I/O command structure gains
permanently allocated page(s) for indirect page references and the Xen grant
references for those pages.
In dev/xen/blkfront/blkfront.c we now check in xbd_queue_cb whether a request
is small enough to handle without an indirection page, and either follow the
previous behaviour or use new code for issuing an indirect segment I/O. In
xbd_connect we read the size of indirect segment I/Os supported by the backend
and select the maximum size we will use; then allocate the pages and Xen grant
references for each I/O command structure. In xbd_free those grants and pages
are released.
A new loader tunable, hw.xbd.xbd_enable_indirect, can be set to 0 in order to
disable this functionality; it works by pretending that the backend does not
support this feature. Some backends exhibit a loss of performance with large
I/Os, so users may wish to test with and without this functionality enabled.
Reviewed by: royger
MFC after: 3 days
Relnotes: yes
R_Free(). This matches the other macros and reduces the chances to clash
with other headers.
This also fixes the build of radix.c outside of the kernel environment.
Reviewed by: glebius
unconfirmed clientid structure for the same client on the last hash list,
this old entry would not be removed/deleted. I do not think this bug would have
caused serious problems, since the new entry would have been before the old one
on the list. This old entry would have eventually been scavenged/removed.
Detected while reading the code looking for another bug.
MFC after: 3 days
When firewalls force a reloop of packets and the caller supplied a route the reference to the route might be reduced twice creating issues.
This is especially the scenario when a packet is looped because of operation in the firewall but the new route lookup gives a down route.
Differential Revision: https://reviews.freebsd.org/D3037
Reviewed by: gnn
Approved by: gnn(mentor)
ip_output has a big chunk of code used to handle special cases with pfil consumers which also forces a reloop on it.
Gather all this code together to make it readable and properly handle the reloop cases.
Some of the issues identified:
M_IP_NEXTHOP is not handled properly in existing code.
route reference leaking is possible with in FIB number change
route flags checking is not consistent in the function
Differential Revision: https://reviews.freebsd.org/D3022
Reviewed by: gnn
Approved by: gnn(mentor)
MFC after: 4 weeks
non-inline urgent data and introduce an mbuf exhaustion attack vector
similar to FreeBSD-SA-15:15.tcp, but not requiring VNETs.
Address the issue described in FreeBSD-SA-15:15.tcp.
Reviewed by: glebius
Approved by: so
Approved by: jmallett (mentor)
Security: FreeBSD-SA-15:15.tcp
Sponsored by: Norse Corp, Inc.
Summary:
Pipes in CloudABI are unidirectional. The reason for this is that
CloudABI attempts to provide a uniform runtime environment across
different flavours of UNIX.
Instead of implementing a custom pipe that is unidirectional, we can
simply reuse Capsicum permission bits to support this. This is nice,
because CloudABI already attempts to restrict permission bits to
correspond with the operations that apply to a certain file descriptor.
Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes
a pair of filecaps. These filecaps are passed to the newly introduced
falloc_caps() function that creates the descriptors with rights in
place.
Test Plan:
CloudABI pipes seem to be created with proper rights in place:
https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44
Reviewers: jilles, mjg
Reviewed By: mjg
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3236
falloc_noinstall() followed by finstall() allows you to create and
install file descriptors with custom capabilities. Add falloc_caps()
that can do both of these actions in one go.
This will be used by CloudABI to create pipes with custom capabilities.
Reviewed by: mjg
ip_encap already has inspected mbuf's data, at least an IP header.
And it is safe to use mtod() and do direct access to needed fields.
Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that
mbuf has a packet header.
Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also
remove "martian filters" checks. According to RFC 4213 it is enough to
verify that the source address is the address of the encapsulator, as
configured on the decapsulator.
Reviewed by: melifaro
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
CloudABI's openat() ensures that files are opened with the smallest set
of relevant rights. For example, when opening a FIFO, unrelated rights
like CAP_RECV are automatically removed. To remove unrelated rights, we
can just reuse the code for this that was already present in the rights
conversion function.
Limit the number of supported device IDs to 0x100000
in order to decrease the size of the ITS device table so
that it matches with the HW capabilities.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3131
Skip checks for IPv6 multicast addresses.
Use in6_localip() for global unicast.
And for IPv6 link-local addresses do search in the IPv6 addresses list.
Since LLA are stored in the kernel internal form, use
IN6_ARE_MASKED_ADDR_EQUAL() macro with lla_mask for addresses comparison.
lla_mask has zero bits in the second word, where we keep sin6_scope_id.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.
Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149
problems that was introduced in r285336... I have verified that
HMAC-SHA2-256 both ah only and w/ AES-CBC interoperate w/ a NetBSD
6.1.5 vm...
Reviewed by: gnn
When we allocate the struct pf_fragment in pf_fillup_fragment() we forgot to
initialise the fr_flags field. As a result we sometimes mistakenly thought the
fragment to not be a buffered fragment. This resulted in panics because we'd end
up freeing the pf_fragment but not removing it from V_pf_fragqueue (believing it
to be part of V_pf_cachequeue).
The next time we iterated V_pf_fragqueue we'd use a freed object and panic.
While here also fix a pf_fragment use after free in pf_normalize_ip().
pf_reassemble() frees the pf_fragment, so we can't use it any more.
PR: 201879, 201932
MFC after: 5 days
Summary:
CloudABI's readdir() system call could be thought of as a mixture
between FreeBSD's getdents(2) and pread(). Instead of using the file
descriptor offset, userspace provides a 64-bit cloudabi_dircookie_t
continue reading at a given point. CLOUDABI_DIRCOOKIE_START, having
value 0, can be used to return entries at the start of the directory.
The file descriptor offset is not used to store the cookie for the
reason that in a file descriptor centric environment, it would make
sense to allow concurrent use of a single file descriptor.
The remaining space returned by the system call should be filled with a
partially truncated copy of the next entry. The advantage of doing this
is that it gracefully deals with long filenames. If the C library
provides a buffer that is too small to hold a single entry, it can still
extract the directory entry header, meaning that it can retry the read
with a larger buffer or skip it using the cookie.
Test Plan:
This implementation passes the cloudlibc unit tests at:
https://github.com/NuxiNL/cloudlibc/tree/master/src/libc/dirent
Reviewers: marcel, kib
Reviewed By: kib
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3226
'buf' is inconvenient and has lead me to some irritating to discover
bugs over the years. It also makes it more challenging to refactor
the buf allocation system.
- Move swbuf and declare it as an extern in vfs_bio.c. This is still
not perfect but better than it was before.
- Eliminate the unused ffs function that relied on knowledge of the buf
array.
- Move the shutdown code that iterates over the buf array into vfs_bio.c.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
While here, reduce the style diff with Linux.
There is no functional change. The goal is to ease the future update to
Linux 3.8's i915 driver.
MFC after: 2 months
attached to bufs to avoid the overhead of the vm. This purposes is now
better served by vmem. Freeing the kva immediately when a buf is
destroyed leads to lower fragmentation and a much simpler scan algorithm.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
enqueueing the frames when it fails. This way there is some latency
removed from the transmitting path.
- If IFF_DRV_OACTIVE is set (and also if IFF_DRV_RUNNING is not) just
enqueue the desired frames and return successful transmit. This way we
avoid to return errors on transmit side and resulting in
possible out-of-order frames. Please note that IFF_DRV_OACTIVE is set
everytime we get the threshold ring hit, so this can be happening quite
often.
Submitted by: Attilio.Rao@isilon.com
MFC after:5 days
On some platforms, the /cpus node contains cpu-to-cluster
map which deffinitely is not a CPU node. Its presence was
causing incrementing of "id" variable and reporting more
CPUs available than it should.
To make "id" valid, increment it only when an entry really
is a CPU device.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3216
CloudABI uses a system call interface to modify file attributes that is
more similar to KPI's/FUSE, namely where a stat structure is passed back
to the kernel, together with a bitmask of attributes that should be
changed. This would allow us to update any set of attributes atomically.
That said, I'd rather not go as far as to actually implement it that
way, as it would require us to duplicate more code than strictly needed.
Let's just stick to the combinations that are actually used by
cloudlibc.
Obtained from: https://github.com/NuxiNL/freebsd
As ZFS requires a more kernel stack pages than is the default on some
architectures e.g. i386, warn if KSTACK_PAGES is less than
ZFS_MIN_KSTACK_PAGES (which is 4 at the time of writing).
MFC after: 3 days
Sponsored by: Multiplay
Remove NAKing limit and pause IN and OUT transactions for 125us in
case of NAK response for BULK and CONTROL endpoints. This gets the
receive latency down and improves USB network throughput at the cost
of some CPU usage.
MFC after: 1 month
ordering semantic of x86 CPUs makes only the compiler barrier
neccessary to give the acquire behaviour.
Existing implementation ensured sequentially consistent semantic for
load_acq, making much stronger guarantee than required by standard's
definition of the load acquire. Consumers which depend on the barrier
are believed to be identified and already fixed to use proper
operations.
Noted by: alc (long time ago)
Reviewed by: alc, bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
sosend(). The only release on the xp_snt_cnt is done after sosend(),
with an intent to synchronize with load_acq in svc_vc_ack().
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
The file_create() system call can be used to create files of a given
type. Right now it can only be used to create directories and FIFOs. As
CloudABI does not expose filesystem permissions, this system call lacks
a mode argument. Simply use 0777 or 0666 depending on the file type.
Summary:
CloudABI provides access to two different stat structures:
- fdstat, containing file descriptor level status: oflags, file
descriptor type and Capsicum rights, used by cap_rights_get(),
fcntl(F_GETFL), getsockopt(SO_TYPE).
- filestat, containing your regular file status: timestamps, inode
number, used by fstat().
Unlike FreeBSD's stat::st_mode, CloudABI file descriptor types don't
have overloaded meanings (e.g., returning S_ISCHR() for kqueues). Add a
utility function to extract the type of a file descriptor accurately.
CloudABI does not work with O_ACCMODEs. File descriptors have two sets
of Capsicum-style rights: rights that apply to the file descriptor
itself ('base') and rights that apply to any new file descriptors
yielded through openat() ('inheriting'). Though not perfect, we can
pretty safely decompose Capsicum rights to such a pair. This is done in
convert_capabilities().
Test Plan: Tests for these system calls are fairly extensive in cloudlibc.
Reviewers: jonathan, mjg, #manpages
Reviewed By: mjg
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3171
xhci_start_controller() to xhci_init(). These values don't change at run-
time so there's no point of acquiring them on every USB_HW_POWER_RESUME
instead of only once during initialization. In r276717, reading the first
couple of registers in question already had been moved as a prerequisite
for the changes in that revision.
- Identify ASMedia ASM1042A controllers.
- Use NULL instead of 0 for pointers.
MFC after: 3 days
Summary:
Back in 2005, maxim@ attempted to fix shutdown() to return ENOTCONN in case the socket was not connected (r150152). This had to be rolled back (r150155), as it broke some of the existing programs that depend on this behavior. I reapplied this change on my system and indeed, syslogd failed to start up. I fixed this back in February (279016) and MFC'ed it to the supported stable branches. Apart from that, things seem to work out all right.
Since at least Linux and Mac OS X do the right thing, I'd like to go ahead and give this another try. To keep old copies of syslogd working, only start returning ENOTCONN for recent binaries.
I took a look at the XNU sources and they seem to test against both SS_ISCONNECTED, SS_ISCONNECTING and SS_ISDISCONNECTING, instead of just SS_ISCONNECTED. That seams reasonable, so let's do the same.
Test Plan:
This issue was uncovered while writing tests for shutdown() in CloudABI:
https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/sys/socket/shutdown_test.c#L26
Reviewers: glebius, rwatson, #manpages, gnn, #network
Reviewed By: gnn, #network
Subscribers: bms, mjg, imp
Differential Revision: https://reviews.freebsd.org/D3039
variant of Microsoft RNDIS, i. e. their unofficial version of CDC ACM,
has been disabled in r261544 for resolving a conflict with umodem(4).
Eventually, in r275790 that problem was dealt with in the right way.
However, r275790 failed to put probing of RNDIS devices in question
back.
- Initialize the device prior to querying it, as required by the RNDIS
specification. Otherwise already determining the MAC address may fail
rightfully.
- On detach, halt the device again.
- Use UCDC_SEND_ENCAPSULATED_{COMMAND,RESPONSE}. While these macros are
resolving to the same values as UR_{CLEAR_FEATURE,GET_STATUS}, the
former set is way more appropriate in this context.
- Report unknown - rather: unimplemented - events unconditionally and
not just in debug mode. This ensures that we'll get some hint of what
is going wrong instead of the driver silently failing.
- Deal with the Microsoft ActiveSync requirement of using an input buffer
the size of the expected reply or larger - except for variably sized
replies - when querying a device.
- Fix some pointless NULL checks, style bugs etc.
This changes allow urndis(4) to communicate with a Microsoft-certified
USB RNDIS test token.
MFC after: 3 days
Sponsored by: genua mbh
Summary:
CloudABI provides two different types of futex objects: read-write locks
and condition variables. There is no need to provide separate support
for once objects and thread joining, as these are efficiently simulated
by blocking on a read-write lock. Mutexes simply use read-write locks.
Condition variables always have a lock object associated to them. They
always know to which lock a thread needs to be migrated if woken up.
This allows us to implement requeueing. A broadcast on a condition
variable will never cause multiple threads to be woken up at once. They
will be woken up iteratively.
This implementation still has lots of room for improvement. Locking is
coarse and right now we use linked lists to store all of the locks and
condition variables, instead of using a hash table. The primary goal of
this implementation was to behave correctly. Performance will be
improved as we go.
Test Plan:
This futex implementation has been in use for the last couple of months
and seems to work pretty well. All of the cloudlibc and libc++ unit
tests seem to pass.
Reviewers: dchagin, kib, vangyzen
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3148
Futex object scopes have been renamed from using their own constants to
simply reusing the existing CLOUDABI_MAP_{PRIVATE,SHARED} flags, as they
are more accurate in this context.
Currently LOCK_DEBUG is always defined in sys/lock.h (0 or 1).
This means that debugging code always built. In addition the kernel
modules have always defined LOCK_DEBUG as 1. So, debugging rmlock code
is always used by kernel modules.
MFC after: 1 week
determining whether a node changed.
Other filesystems, e.g., UFS, only check on seconds, when determining
whether something changed.
This also corrects the birthtime case, where we checked tv_nsec
twice, instead of tv_sec and tv_nsec (PR).
PR: 201284
Submitted by: David Binderman
Patch suggested by: kib
Reviewed by: kib
MFC after: 2 weeks
Committed from: Essen FreeBSD Hackathon
Assume that a vnode is mapped shared and mlocked(), and then the vnode
is truncated, or truncated and then again extended past the mapping
point EOF. Truncation removes the pages past the truncation point,
and if pages are later created at this range, they are not properly
mapped into the mlocked region, and their wiring count is wrong.
The revert leaves the invalidated but wired pages on the object queue,
which means that the pages are found by vm_object_unwire() when the
mapped range is munlock()ed, and reused by the buffer cache when the
vnode is extended again.
The changes in r173708 were required since then vm_map_unwire() looked
at the page tables to find the page to unwire. This is no longer
needed with the vm_object_unwire() introduction, which follows the
objects shadow chain.
Also eliminate OBJPR_NOTWIRED flag for vm_object_page_remove(), which
is now redundand, we do not remove wired pages.
Reported by: trasz, Dmitry Sivachenko <trtrmitya@gmail.com>
Suggested and reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
b_kvabase when the buffer is reclaimed. Otherwise, if b_data for the
mapped buffer was adjusted with the page-offset portion of b_offset,
nothing would re-adjust the b_data, which breaks buffer management
code which expects page-aligned b_data (see e.g. bpman_qenter(), which
skips partial pages).
Fix a minor issue with the GB_KVAALLOC requests, which could result in
returning the mapped buffer if the reused buffer is mapped and have
the right amount of KVA reserved.
Improve assertion in the vfs_buf_check_mapped() to catch unmapped
buffers which have their b_data incorrectly adjusted with offset.
Reported and tested by: pho (previous version)
Reviewed by: jeff (previous version)
Sponsored by: The FreeBSD Foundation
but it's hard to find and easy to miss.
Reviewed by: wblock@
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3183
uart_bus_attach() during its test that 20 iterations weren't sufficient
for clearing all pending interrupts, assuming this means that hardware
is broken and doesn't deassert interrupts. However, under pressure, 20
iterations also can be insufficient for clearing all pending interrupts,
leading to a panic as intr_event_handle() tries to schedule an interrupt
handler not registered. Solve this by introducing a flag that is set in
test mode and otherwise restores pre-r253161 behavior of uart_intr(). The
approach of additionally registering uart_intr() as handler as suggested
in PR 194979 is not taken as that in turn would abuse special pccard and
pccbb handling code of intr_event_handle(). [1]
- Const'ify uart_driver_name.
- Fix some minor style bugs.
PR: 194979 [1]
Reviewed by: marcel (earlier version)
MFC after: 3 days
raw data to the doorbell offset in order to clarify the intent and for
avoiding unnecessarily converting the endianess back and forth.
Unfortunately, the same can't be done in mpt_recv_handshake_reply() as
16-bit data needs to be read using 32-bit bus accessors.
- In mpt_recv_handshake_reply(), get rid of a redundant variable.
MFC after: 1 fortnight
from x86 to use smp_ipi_mtx spin lock not only for smp_rendezvous_cpus()
but also for the MD cache invalidation, TLB demapping and remote register
reading IPIs due to the following reasons:
- The cross-IPI SMP deadlock x86 otherwise is subject to can't happen on
sparc64. That's because on sparc64, spin locks don't disable interrupts
completely but only raise the processor interrupt level to PIL_TICK. This
means that IPIs still get delivered and direct dispatch IPIs such as the
cache invalidation etc. IPIs in question are still executed.
- In smp_rendezvous_cpus(), smp_ipi_mtx is held not only while sending an
IPI_RENDEZVOUS, but until all CPUs have processed smp_rendezvous_action().
Consequently, smp_ipi_mtx may be locked for an extended amount of time as
queued IPIs (as opposed to the direct ones) such as IPI_RENDEZVOUS are
scheduled via a soft interrupt. Moreover, given that this soft interrupt
is only delivered at PIL_RENDEZVOUS, processing of smp_rendezvous_action()
on a target may be interrupted by f. e. a tick interrupt at PIL_TICK, in
turn leading to the target in question trying to send an IPI by itself
while IPI_RENDEZVOUS isn't fully handled, yet, and, thus, resulting in a
deadlock.
o As mentioned in the commit message of r245850, on least some sun4u platforms
concurrent sending of IPIs by different CPUs is fatal. Therefore, hold the
reintroduced MD ipi_mtx also while delivering cross-traps via MI helpers,
i. e. ipi_{all_but_self,cpu,selected}().
o Akin to x86, let the last CPU to process cpu_mp_bootstrap() set smp_started
instead of the BSP in cpu_mp_unleash(). This ensures that all APs actually
are started, when smp_started is no longer 0.
o In all MD and MI IPI helpers, check for smp_started == 1 rather than for
smp_cpus > 1 or nothing at all. This avoids races during boot causing IPIs
trying to be delivered to APs that in fact aren't up and running, yet.
While at it, move setting of the cpu_ipi_{selected,single}() pointers to
the appropriate delivery functions from mp_init() to cpu_mp_start() where
it's better suited and allows to get rid of the global isjbus variable.
o Given that now concurrent IPI delivery no longer is possible, also nuke
the delays before completely disabling interrupts again in the CPU-specific
cross-trap delivery functions, previously giving other CPUs a window for
sending IPIs on their part. Actually, we now should be able to entirely get
rid of completely disabling interrupts in these functions. Such a change
needs more testing, though.
o In {s,}tick_get_timecount_mp(), make the {s,}tick variable static. While not
necessary for correctness, this avoids page faults when accessing the stack
of a foreign CPU as {s,}tick now is locked into the TLBs as part of static
kernel data. Hence, {s,}tick_get_timecount_mp() always execute as fast as
possible, avoiding jitter.
PR: 201245
MFC after: 3 days
Summary:
Unlike FreeBSD, CloudABI does not use null terminated strings for its
pathnames. Introduce a function called copyin_path() that can be used by
all of the filesystem system calls that use pathnames. This change
already implements the system calls that don't depend on any additional
functionality (e.g., conversion of struct stat).
Also implement the socket system calls that operate on pathnames, namely
the ones used by the C library functions bindat() and connectat(). These
don't receive a 'struct sockaddr_un', but just the pathname, meaning
they could be implemented in such a way that they don't depend on the
size of sun_path. For now, just use the existing interfaces.
Add a missing #include to cloudabi_syscalldefs.h to get this code to
build, as one of its macros depends on UINT64_C().
Test Plan:
These implementations have already been tested in the CloudABI branch on
GitHub. They pass all of the tests.
Reviewers: kib, pjd
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3097
- Use pointer assignment rather than a combination of pointers and
flags to switch buffers between unmapped and mapped. This eliminates
multiple flags and generally simplifies the logic.
- Eliminate b_saveaddr since it is only used with pager bufs which have
their b_data re-initialized on each allocation.
- Gather up some convenience routines in the buffer cache for
manipulating buf space and buf malloc space.
- Add an inline, buf_mapped(), to standardize checks around unmapped
buffers.
In collaboration with: mlaier
Reviewed by: kib
Tested by: pho (many small revisions ago)
Sponsored by: EMC / Isilon Storage Division
In the CloudABI code I sometimes call into cap_rights_* without
providing any arguments. Though one could argue that this doesn't make
sense, in this specific case it's hard to avoid, as the rights that
should be tested against are forwarded by a couple of wrapper macros.
most recently used buffer when we are under paging pressure. This is
a perversion of the buffer and page replacement algorithms and recent
improvements to the page daemon have rendered it unnecessary. In the
event that low-memory deadlocks become an issue it would be possible
to make a daemon or event handler that performs a similar action on
the oldest buffers rather than the newest. Since the buf cache
is analogous to the page cache and some minimum working set is desired
another possibility is to simply shrink the minimum working set which
has less downside now that file pages are not directly mapped.
Sponsored by: EMC / Isilon
Reviewed by: alc, kib (with some minor objection)
Tested by: pho
Apologies, this was how it was supposed to land. Mea culpa.
Differential Revision: https://reviews.freebsd.org/D3157
Reviewed by: gnn, hiren
Approved by: markj (mentor)
MFC after: 1 week
Driver must be able to start against older firmware that is missing
recently added MCDI calls, otherwise firmware upgrade will not be
possible.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D3145
1) We were not handling (or sending) the IN_PROGRESS case if
the other side (or our side) was not able to reset (awaiting more data).
2) We would improperly send a stream-reset when we should not. Not
waiting until the TSN had been assigned when data was inqueue.
Reviewed by: tuexen
- Allocate resources for MSI-X table and PBA if necessary
- Add function ahci_free_mem() to free all resources
Reviewed by: jhb, mav
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3009
Place all of the machine/pointer size independent code in a kernel
module called 'cloudabi'. All of the 64-bit specific code goes in a
separate module called 'cloudabi64'. The latter is only enabled on
amd64, as it is the only architecture supported.
Windows Server 2012 and earlier hosts.
Submitted by: whu
Reviewed by: royger
Approved by: royger
MFC after: 3 days
Relnotes: No
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D3086
- Fix segment registers to only display the low 16 bits.
- Remove unused handlers and entries for the debug registers.
- Display xcr0 (if valid) in 'show sysregs'.
- Add '0x' prefix to MSR values to match other values in 'show sysregs'.
- MFamd64: Display various MSRs in 'show sysregs'.
- Add a 'show dbregs' to display the value of debug registers.
- Dynamically size the column width for register values to properly
align columns on 64-bit platforms.
- Display %gs for i386 in 'show registers'.
Differential Revision: https://reviews.freebsd.org/D2784
Reviewed by: kib, markj
MFC after: 2 weeks
The i386 and amd64 DDB stack unwinders contain code to detect and handle
the case where the first frame is not completely set up or torn down. This
code was accidentally unused however, since db_backtrace() was never called
with a non-NULL trap frame. This change fixes that.
Also remove get_rsp() from the amd64 code. It appears to have come from
i386, which needs to take into account whether the exception triggered a
CPL switch, since SS:ESP is only pushed onto the stack if so. On amd64,
SS:RSP is pushed regardless, so get_rsp() was doing the wrong thing for
kernel-mode exceptions. As a result, we can also remove custom print
functions for these registers.
Reviewed by: jhb
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D2881
If we can't find a symbol corresponding to the faulting instruction, assume
that the previously-executed function is a call and attempt to find the
calling function using the return address on the stack. Otherwise we end
up associating the last stack frame with the current call, which is
incorrect and causes the unwinder to skip printing of the calling function,
resulting in a confusing backtrace.
Reviewed by: jhb
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D2859
Undefined symbols have a value of zero, so it makes no sense to return
such a symbol when performing a lookup by value. This occurs for example
when unwinding the stack after calling a NULL function pointer, and we
confusingly report the faulting function as uart_sab82532_class() on
amd64.
Convert db_print_loc_and_inst() to only attempt disassembly if we managed
to find a symbol corresponding to the IP. Otherwise we may fault and
re-enter the debugger.
Reviewed by: jhb
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D2858
The amd64 port copied some code from i386 to fetch function arguments and
display them in backtraces. However, it was commented out and can't easily
be implemented since the function arguments are passed in
registers rather than on the stack in amd64. Remove it in preparation for
some bug fixes in this area.
Reviewed by: jhb
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D2857
done by the functions called on other CPUs, are visible to the caller.
Pair otherwise useless acquire on smp_rv_waiters[3] with a release add
to ensure synchronized with relation, which guarantees visibility.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
- Add required MAC/VLAN filter when adding an LAA
- Fix bug where code did not check for I40E_SUCCESS from a successful
i40e_validate_mac_address() call in ixl_init_locked(), when setting
an LAA.
PR: 201240
Differential Revision: https://reviews.freebsd.org/D3111
Submitted by: Gregory Rose <gregory.v.rose@intel.com>
Reviewed by: gnn, rstone
Approved by: gnn
MFC after: 2 weeks
The NVMe specification has no ability to specify a maximum delete size
that is less than the full capacity of the namespace - so just using the
namespace size is the correct value here.
This fixes reported issues where ZFS trim on init looked like it was
hanging the system - previously the default I/O max size (128KB on
Intel NVMe controllers) was used for delete operations which worked out
to only about 8MB/s. With this patch I can add an 800GB DC P3700
drive to a ZFS pool in about 15-20 seconds.
Reported by: Dylan Just <dylan@techtangents.com>
MFC after: 3 days
Sponsored by: Intel
This feature is inspired by another Unix-alike OS commonly found on
airplane headrests.
A number of beasties[0] are drawn at top of framebuffer during boot,
based on the number of active SMP CPUs[1]. Console buffer output
continues to scroll in the screen area below beastie(s)[2].
After some time[3] has passed, the beasties are erased leaving the
entire terminal for use.
Includes two 80x80 vga16 beastie graphics and an 80x80 vga16 orb
graphic. (The graphics are RLE compressed to save some space -- 3x 3200
bytes uncompressed, or 4208 compressed.)
[0]: The user may select the style of beastie with
kern.vt.splash_cpu_style=(0|1|2)
[1]: Or the number may be overridden with tunable kern.vt.splash_ncpu.
[2]: https://www.youtube.com/watch?v=UP2jizfr3_o
[3]: Configurable with kern.vt.splash_cpu_duration (seconds, def. 10).
Differential Revision: https://reviews.freebsd.org/D2181
Reviewed by: dumbbell, emaste
Approved by: markj (mentor)
MFC after: 2 weeks
Explicitly mark existing VT_SYSCTL_INTs static. This is in preparation for
D2181.
Reviewed by: dumbbell, emaste
Approved by: markj (mentor)
MFC after: 1 week
Add brief commentary to vendor-specific devid function in ITS
and remove redundant spaces by the way.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
malloc() should not go to sleep in case of lack of resource while
the kernel thread is holding a non-sleepable lock.
- change malloc() flags to M_NOWAIT in such cases implement
lpi_free_chunk() routine as it will be needed when ITT
allocation fails in its_device_alloc_locked()
- do not increase verbosity of this code since upper layers will
communicate an error if the interrupt setup fails
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3121
Though the standard C library uses a 'struct timespec' using a 64-bit
'time_t', there is no need to use such a type at the system call level.
CloudABI uses a simple 64-bit unsigned timestamp in nanoseconds. This is
sufficient to express any time value from 1970 to 2554.
The CloudABI low-level interface also supports fetching timestamp values
with a lower precision. Instead of overloading the clock ID argument for
this purpose, the system call provides a precision argument that may be
used to specify the maximum slack. The current system call
implementation does not use this information, but it's good to already
have this available.
Expose cloudabi_convert_timespec(), as we're going to need this for
fstat() as well.
Obtained from: https://github.com/NuxiNL/freebsd
It is possible that some HW will use different PCI devids,
hence allow to replace the default domain🚌slot:func schema
by implementing and registering custom function.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3118
it_need was wrong [*]. Restore the releases and add a comment
explaining why it is needed.
Noted by: alc [*]
Reviewed by: bde [*]
Sponsored by: The FreeBSD Foundation
Use Vritual Counter register associated with Generic Timer to
read the cyclecount.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3134
Summary:
Remove the stub system call that was put in place during the system call
import and replace it by a target-dependent version stored in sys/amd64.
Initialize the thread in a way similar to cpu_set_upcall_kse(). We
provide the entry point with two arguments: the thread ID and the
argument pointer.
Test Plan:
Thread creation still seems to work, both for FreeBSD and CloudABI
binaries.
Reviewers: dchagin, mjg, kib
Reviewed By: kib
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3110
Add a method to identify CPU based on RAW MIDR value.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3117
lock on the INP before calling the tunnel protocol, else a LOR
may occur (it does with SCTP for sure). Instead we must acquire a
ref count and release the lock, taking care to allow for the case
where the UDP socket has gone away and *not* unlocking since the
refcnt decrement on the inp will do the unlock in that case.
Reviewed by: tuexen
MFC after: 3 weeks
Just like FreeBSD+Capsicum, CloudABI uses process descriptors. Return
the file descriptor number to the parent process.
To the child process we both return a special value for the file
descriptor number (CLOUDABI_PROCESS_CHILD). We also return the thread ID
of the new thread in the copied process, so the threading library can
reinitialize itself.
Obtained from: https://github.com/NuxiNL/freebsd
This change refactors the existing create_thread() function to be more
generic. It replaces almost all of its arguments by a callback that can
be used to extract the thread ID and copy it out to the right place, but
also to perform additional initialization steps, such as setting the
trapframe. This also makes the difference between thr_new() and
thr_create() more clear in my opinion.
This function is going to be used by the CloudABI compatibility layer.
It looks like the OpenSolaris compatibility framework already provides a
function called thread_create(). Rename this function to
do_thread_create() and use a macro to deal with the namespacing
conflict. A similar approach is already used for thread_exit().
MFC after: 1 month
in lockstat.ko. This means that lockstat probes now have typed arguments and
will utilize SDT probe hot-patching support when it arrives.
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D2993
Remove useless release semantic for some stores to it_need. For
stores where the release is needed, add a comment explaining why.
Fence after the atomic_cmpset() op on the it_need should be acquire
only, release is not needed (see above). The combination of
atomic_cmpset() + fence_acq() is better expressed there as
atomic_cmpset_acq().
Use atomic_cmpset() for swi' ih_need read and clear.
Discussed with: alc, bde
Reviewed by: bde
Comments wording provided by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
with some cards that causes them to become deselected after probing for
switch capabilities. The old workaround fixes the behavior with some cards,
but causes problems with the cards the behave correctly and don't become
deselected. Forcing a deselect then reselect appears to work correctly
with all cards in initial testing.
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).
Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig. p_xexit contains complete status
and copied out into si_status.
Requested by: Joerg Schilling
Reviewed by: jilles (previous version), pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
a set of differentiated services, set IPTOS_PREC_* macros using
IPTOS_DSCP_* macro definitions.
While here, add IPTOS_DSCP_VA macro according to RFC 5865.
Differential Revision: https://reviews.freebsd.org/D3119
Reviewed by: gnn
LO_NOPROFILE is set. Some timecounter handlers acquire a spin mutex, and
we don't want to recurse if lockstat probes are enabled.
PR: 201642
Reviewed by: avg
MFC after: 3 days
enabled. The cost of a timecounter read can be quite significant, and the
problem became more apparent after r284297, since that change resulted in
a call to lockstat_nsecs() for each acquisition of an rwlock read lock.
PR: 201642
Reviewed by: avg
Tested by: Jason Unovitch
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D3073
It turns out that the CDDL sources already introduce a function called
thread_create(). I'll investigate what we can do to make these functions
coexist.
Reported by: Ivan Klymenko
scaling code does not use an uninitialized timestamp echo reply value
from the stack when timestamps are not enabled.
Differential Revision: https://reviews.freebsd.org/D3060
Reviewed by: hiren
Approved by: jmallett (mentor)
MFC after: 3 days
Sponsored by: Norse Corp, Inc.
This change refactors the existing create_thread() function to be more
generic. It replaces almost all of its arguments by a callback that can
be used to extract the thread ID and copy it out to the right place, but
also to perform additional initialization steps, such as setting the
trapframe. This also makes the difference between thr_new() and
thr_create() more clear in my opinion.
This function is going to be used by the CloudABI compatibility layer.
Reviewed by: kib
MFC after: 1 month
Basing on B.2.3.4:
Synchronization and coherency issues between data and
instruction accesses.
To ensure that modified instructions are visible to all PEs
(Processing Elements) in a shareability domain one need to
perform following sequence:
1. Clean D-cache
2. Ensure the visibility of data cleaned from cache
3. Invalidate I-cache
4. Ensure completion
5. In SMP system PE must issue isb to ensure execution of the
modified instructions
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3106
Secondary stack calculation is modified to provide
stack_top = secondary_stacks + (cpu_id) * PAGE_SIZE * KSTACK_PAGES
because on ARM64 the stack grows to lower memory addresses.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3107
Previous DMAP size was too small for systems with more than 64GB
of RAM. Increase it to 128GB to support ThunderX CRB.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3113
Add support for the <sys/mman.h> functions by wrapping around our own
implementations. There are no kern_*() variants of these system calls,
but we also don't need them in this case. It is sufficient to just call
into the sys_*() functions.
Differential Revision: https://reviews.freebsd.org/D3033
Reviewed by: brooks
save it for later. This enables direct manipulation of the indirection
tables (although the stock driver doesn't do that right now).
MFC after: 1 month
belongs to the kernel stack address range for the thread. Right now,
code checks that new frame is not farther then KSTACK_PAGES pages from
the current frame, which allows the address to point past the top of
the stack.
Reviewed by: andrew, emaste, markj
Differential revision: https://reviews.freebsd.org/D3108
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Summary:
For CloudABI we need to put two things on the stack of new processes:
the argument data (a binary blob; not strings) and a startup data
structure. The startup data structure contains interesting things such
as a pointer to the ELF program header, the thread ID of the initial
thread, a stack smashing protection canary, and a pointer to the
argument data.
Fetching system call arguments and setting the return value is similar
to FreeBSD. The only differences are that system call 0 does not exist
and that we call into cloudabi_convert_errno() to convert the error
code. We also need this function in a couple of other places, so we'd
better reuse it here.
Reviewers: dchagin, kib
Reviewed By: kib
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3098
1) Use the TX FIFO empty interrupts to poll the transmit FIFO usage,
instead of using own software counters and waiting for SOF
interrupts. Assume that enough FIFO space is available to execute one
USB OUT transfer of any kind when the TX FIFO is empty.
2) Use the host channel halted event to asynchronously wait for host
channels to be disabled instead of waiting for SOF interrupts. This
results in less turnaround time for re-using host channels and at the
same time increases the performance.
The network transmit performance measured by "iperf" for the "RPi-B v1
2011/12" board, increased from 45MBit/s to 65Mbit/s after applying the
changes above.
No regressions seen using:
- High Speed (BULK, CONTROL, INTERRUPT)
- Full Speed (All transfer types)
- Low Speed (Control and Interrupt)
MFC after: 1 month
Submitted by: Daisuke Aoyama <aoyama@peach.ne.jp>
Their primary use was in thread_cow_update to free up old resources.
Freeing had to be done with proc lock held and _cow_ funcs already knew
how to free old structs.
Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free
list) of either counter are still guarded with vnode interlock.
Reviewed by: kib (earlier version)
Tested by: pho
If KSTACK_PAGES was changed to anything alse than the default,
the value from param.h was taken instead in some places and
the value from KENRCONF in some others. This resulted in
inconsistency which caused corruption in SMP envorinment.
Ensure all places where KSTACK_PAGES are used the opt_kstack_pages.h
is included.
The file opt_kstack_pages.h could not be included in param.h
because was breaking the toolchain compilation.
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3094
This commit adds proper cache and shareability attributes to
the TCR register.
Set memory attributes to Normal, outer and inner cacheable WBWA.
Set shareability to inner and outer shareable when SMP is enabled.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3093
Summary:
In a runtime that is purely based on capability-based security, there is
a strong emphasis on how programs start their execution. We need to make
sure that we execute an new program with an exact set of file
descriptors, ensuring that credentials are not leaked into the process
accidentally.
Providing the right file descriptors is just half the problem. There
also needs to be a framework in place that gives meaning to these file
descriptors. How does a CloudABI mail server know which of the file
descriptors corresponds to the socket that receives incoming emails?
Furthermore, how will this mail server acquire its configuration
parameters, as it cannot open a configuration file from a global path on
disk?
CloudABI solves this problem by replacing traditional string command
line arguments by tree-like data structure consisting of scalars,
sequences and mappings (similar to YAML/JSON). In this structure, file
descriptors are treated as a first-class citizen. When calling exec(),
file descriptors are passed on to the new executable if and only if they
are referenced from this tree structure. See the cloudabi-run(1) man
page for more details and examples (sysutils/cloudabi-utils).
Fortunately, the kernel does not need to care about this tree structure
at all. The C library is responsible for serializing and deserializing,
but also for extracting the list of referenced file descriptors. The
system call only receives a copy of the serialized data and a layout of
what the new file descriptor table should look like:
int proc_exec(int execfd, const void *data, size_t datalen, const int *fds,
size_t fdslen);
This change introduces a set of fd*_remapped() functions:
- fdcopy_remapped() pulls a copy of a file descriptor table, remapping
all of the file descriptors according to the provided mapping table.
- fdinstall_remapped() replaces the file descriptor table of the process
by the copy created by fdcopy_remapped().
- fdescfree_remapped() frees the table in case we aborted before
fdinstall_remapped().
We then add a function exec_copyin_data_fds() that builds on top these
functions. It copies in the data and constructs a new remapped file
descriptor. This is used by cloudabi_sys_proc_exec().
Test Plan:
cloudabi-run(1) is capable of spawning processes successfully, providing
it data and file descriptors. procstat -f seems to confirm all is good.
Regular FreeBSD processes also work properly.
Reviewers: kib, mjg
Reviewed By: mjg
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3079
It appears that the linker will not handle 64-bit relocations at addresses that
are not aligned to 8-byte boundaries. Prior to this change the line:
.llong generictrap
was aligned to a 4-byte address, and the linker replaced that with an 8-byte
0x0. Aligning that address to 8 bytes caused the linker to generate the proper
relocation. As a follow-through, the dblow from trap_subr33.S used the code
sequence 'lwz %r1, TRAP_GENTRAP(0)', so this reproduces the analogue of that for
64-bit.
polling at device attach time [1].
Add tunables 'debug.uart_force_poll' and 'debug.uart_poll_freq' to control
uart polling.
Submitted by: Aleksey Kuleshov (rndfax@yandex.ru) [1]
architectures. Atomic_cmpset_int(9) is a direct replacement, due to
loop. The change fixes arm, arm64, mips an sparc64, which lack
atomic_swap().
Suggested and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
current value. It is believed that the change is the real fix for the
issue which was covered over by the r252683.
With the current code, if the interrupt handler sets it_need between
read and consequent reset, the update could be lost and
ithread_execute_handlers() would not be called in response to the lost
update.
The r252683 could have hide the issue since at the moment of commit,
atomic_load_acq_int() did locked cmpxchg on the variable, which puts
the cache line into the exclusive owned state and clears store
buffers. Then the immediate store of zero has very high chance of
reusing the exclusive state of the cache line and make the load and
store sequence operate as atomic swap.
For now, add the acq+rel fence immediately after the swap, to not
disturb current (but excessive) ordering. Acquire is needed for the
ih_need reads after the load, while release does not serve a useful
purpose [*].
Reviewed by: alc
Noted by: alc [*]
Discussed with: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Identify current CPU. This is necessary to setup
affinity registers and to provide support for
runtime chip identification.
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3095
We can map these system calls directly to the FreeBSD counterparts. The
other filesystem related system calls will be sent out for review
separately, as they are a bit more complex to get right.
an if_ixv instance can now set at creation time, and the receive ring
tail pointer is correctly initialized (previously, things still worked
because the receive ring tail pointer was being fixed up as a side
effect of other activity).
Differential Revision: https://reviews.freebsd.org/D2922
Reviewed by: erj, gnn
Approved by: jmallett (mentor)
Sponsored by: Norse Corp, Inc.
I performed the commit on a different system as where I wrote the
change. After pulling in the change from Phabricator, I didn't notice
that a single chunk did not apply.
Approved by: secteam (implicit, as intended change was approved)
Pointy hat to: me
The random_get() system call works similar to getentropy()/getrandom()
on OpenBSD/Linux. It fills a buffer with random data.
This change introduces a new function, read_random_uio(), that is used
to implement read() on the random devices. We can call into this
function from within the CloudABI compatibility layer.
Approved by: secteam
Reviewed by: jmg, markm, wblock
Obtained from: https://github.com/NuxiNL/freebsd
Differential Revision: https://reviews.freebsd.org/D3053
The first system call is used to set the user TLS address. Right now
this system call is invoked by the C library for both the initial thread
and additional threads unconditionally, but in the future we'll only
call this if the architecture does not support this. On recent x86-64
CPUs we could use the WRFSBASE instruction.
This system call was erroneously placed in sys/compat/cloudabi64, even
though it does not depend on any pointer size dependent datastructure.
Move it to the right place.
Obtained from: https://github.com/NuxiNL/freebsd
Add a routine similar to copyinuio() and freebsd32_copyinuio() that
copies in CloudABI's struct iovecs. These are then translated into
FreeBSD format and placed in a 'struct uio', so we can call into the
kern_*() functions.
Obtained from: https://github.com/NuxiNL/freebsd
Summary:
As discussed with kib@ in response to r285404, don't call into
kern_sigaction() within proc_raise() to reset the signal to the default
action before delivery. We'd better do that during image execution.
Change the code to simply use pksignal(), so we don't waste cycles on
functions like pfind() to look up the currently running process itself.
Test Plan:
This change has also been pushed into the cloudabi branch on GitHub. The
raise() tests still seem to pass.
Reviewers: kib
Reviewed By: kib
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3076
Call arm_init_secondary before any other PIC-related functions
are called. This is necessary for GICv3 where PIC_INIT_SECONDARY
allocates resources needed for all further operations.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3066
On ARMv8 IPIs are mapped to 0-15. Incrementing the number by 16
is wrong, because it sets a reserved bit in the IPI register.
This patch removes all "+16" to comply with specs.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3029
not. When doing multiqueue, we are all setup to have full 32bit RSS hash from
the card. We do not need to hide that under "ifdef RSS" and should expose that
by default so others like lagg(4) can use that and avoid hashing the traffic by
themselves.
While here, delete the FreeBSD version check and use of deprecated M_FLOWID.
Reviewed by: adrian, erj
MFC after: 1 week
Sponsored by: Limelight Networks
Though confusing, GCM using ICM_BLOCK_LEN, but ICM does not is
correct... GCM is built on ICM, but uses a function other than
swcr_encdec... swcr_encdec cannot handle partial blocks which is
why it must still use AES_BLOCK_LEN and is why XTS was broken by the
commit...
Thanks to the tests for helping sure I didn't break GCM w/ an earlier
patch...
I did run the tests w/o this patch, and need to figure out why they
did not fail, clearly more tests are needed...
Prodded by: peter
Comment that cryptodev shouldn't be used unless you know what you're
doing...
The various arm/mips and one powerpc configs that have cryptodev in
them need to be addressed, audited if they provide benefit and removed
if they don't...
unp_dispose and unp_gc could race to teardown the same mbuf chains, which
can lead to dereferencing freed filedesc pointers.
This patch adds an IGNORE_RIGHTS flag on unpcbs marking the unpcb's RIGHTS
as invalid/freed. The flag is protected by UNP_LIST_LOCK.
To serialize against unp_gc, unp_dispose needs the socket object. Change the
dom_dispose() KPI to take a socket object instead of an mbuf chain directly.
PR: 194264
Differential Revision: https://reviews.freebsd.org/D3044
Reviewed by: mjg (earlier version)
Approved by: markj (mentor)
Obtained from: mjg
MFC after: 1 month
Sponsored by: EMC / Isilon Storage Division
IDs in initiator mode -- index in port database instead of handlers.
This makes initiator IDs persist across role changes and firmware resets,
when handlers previously assigned by firmware are lost and reused.
Sponsored by: iXsystems, Inc.
o Return the real hardware state in gpio_pin_getflags() instead of keep
the last state in an internal table. Now the driver returns the real
state of pins (input/output and pull-up/pull-down) at all times.
o Use a spin mutex. This is required by interrupts and the 1-wire code.
o Use better variable names and place parentheses around them in MACROS.
o Do not lock the driver when returning static data.
Tested with gpioled(4) and DS1820 (1-wire) sensors on banana pi.
If a signal is caught in pipelock, causing it to fail, pipe_direct_write
should not try to pipeunlock.
Reported by: pho
Differential Revision: https://reviews.freebsd.org/D3069
Reviewed by: kib
Approved by: markj (mentor)
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division