and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.
Discussed with: pjd
Each list describes a logical memory object that is backed by one or more
physical address ranges. To minimize locking, the sglist objects
themselves are immutable once they are shared.
These objects may be used in the future to facilitate I/O requests using
physically-addressed buffers. For the immediate future I plan to use them
to implement a new type of VM object and pager.
Reviewed by: jeff, scottl
MFC after: 1 month
with OpenBSD (and BSD/OS originally). We can't easly do it SOL_SOCKET option
as there is no more space for more SOL_SOCKET options, but this option also
fits better as an IP socket option, it seems.
- Implement this functionality also for IPv6 and RAW IP sockets.
- Always compile it in (don't use additional kernel options).
- Remove sysctl to turn this functionality on and off.
- Introduce new privilege - PRIV_NETINET_BINDANY, which allows to use this
functionality (currently only unjail root can use it).
Discussed with: julian, adrian, jhb, rwatson, kmacy
Introduce for this operation the reverse NO_ADAPTIVE_SX option.
The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed
and the new flag, offering the reversed logic, SX_NOADAPTIVE is added.
Additively implements adaptive spininning for sx held in shared mode.
The spinning limit can be handled through sysctls in order to be tuned
while the code doesn't reach the release, after which time they should
be dropped probabilly.
This change has made been necessary by recent benchmarks where it does
improve concurrency of workloads in presence of high contention
(ie. ZFS).
KPI breakage is documented by __FreeBSD_version bumping, manpage and
UPDATING updates.
Requested by: jeff, kmacy
Reviewed by: jeff
Tested by: pho
includes support for NFSv4. The subsystem can optionally be linked
into the kernel using the two options:
NFSCL - the client
NFSD - the server
It is also built as three modules:
nfscl - the client
nfsd - the server
nfscommon - functions shared by the client and server
Approved by: kib (mentor)
The hypervisor doesn't provide a single "TOD" - it instead provides a
"start time" and a "running time". These are added together to form
the current TOD. The TOD is in UTC.
This RTC is only (initially) designed to be read at startup. There's
some further poking that needs to happen to pick up hypervisor time
changes (ie, by the Dom0 time being adjusted by something). This
time adjustment currently can cause "weird stuff" in the DomU clock;
I'll begin investigating and repairing that in subsequent commits.
PR: 135008
Add support for kernel fault injection using KFAIL_POINT_* macros and
fail_point_* infrastructure. Add example fail point in vfs_bio.c to
simulate VM buf pressure.
Approved by: dfr (mentor)
get a quick snapshot of the kernel's symbol table including the symbols
from any loaded modules (the symbols are all merged into one symbol
table). Unlike like other implementations, this ksyms driver maps
memory in the process memory space to store the snapshot at the time
/dev/ksyms is opened. It also checks to see if the process has already
a snapshot open and won't allow it to open /dev/ksyms it again until it
closes first. This prevents kernel and process memory from being
exhausted. Note that /dev/ksyms is used by the lockstat(1) command.
Reviewed by: gallatin kib (freebsd-arch)
Approved by: gnn (mentor)
adds probes for mutexes, reader/writer and shared/exclusive locks to
gather contention statistics and other locking information for
dtrace scripts, the lockstat(1M) command and other potential
consumers.
Reviewed by: attilio jhb jb
Approved by: gnn (mentor)
kernel option.
This also permits tuning of the option per virtual network stack, as
well as separately per inet, inet6.
The kernel option is left for a transition period, marked deprecated,
and will be removed soon.
Initially requested by: phk (1 year 1 day ago)
MFC after: 4 weeks
o replace DLT_IEEE802_11 support in net80211 with DLT_IEEE802_11_RADIO
and remove explicit bpf support from wireless drivers; drivers now
use ieee80211_radiotap_attach to setup shared data structures that
hold the radiotap header for each packet tx/rx
o remove rx timestamp from the rx path; it was used only by the tdma support
for debugging and was mostly useless due to it being 32-bits and mostly
unavailable
o track DLT_IEEE80211_RADIO bpf attachments and maintain per-vap and
per-com state when there are active taps
o track the number of monitor mode vaps
o use bpf tap and monitor mode vap state to decide when to collect radiotap
state and dispatch frames; drivers no longer explicitly directly check
bpf state or use bpf calls to tap frames
o handle radiotap state updates on channel change in net80211; drivers
should not do this (unless they bypass net80211 which is almost always
a mistake)
o update various drivers to be more consistent/correct in handling radiotap
o update ral to include TSF in radiotap'd frames
o add promisc mode callback to wi
Reviewed by: cbzimmer, rpaulo, thompsa
new platform module. These are probed in early boot, and have the
responsibility of determining the layout of physical memory, determining
the CPU timebase frequency, and handling the zoo of SMP mechanisms
found on PowerPC.
Reviewed by: marcel, raj
Book-E parts by: raj
- For CPUs that only support MCE (the machine check exception) but not MCA
(i.e. Pentium), all this does is print out the value of the machine check
registers and then panic when a machine check exception occurs.
- For CPUs that support MCA (the machine check architecture), the support is
a bit more involved.
- First, there is limited support for decoding the CPU-independent MCA
error codes in the kernel, and the kernel uses this to output a short
description of any machine check events that occur.
- When a machine check exception occurs, all of the MCx banks on the
current CPU are scanned and any events are reported to the console
before panic'ing.
- To catch events for correctable errors, a periodic timer kicks off a
task which scans the MCx banks on all CPUs. The frequency of these
checks is controlled via the "hw.mca.interval" sysctl.
- Userland can request an immediate scan of the MCx banks by writing
a non-zero value to "hw.mca.force_scan".
- If any correctable events are encountered, the appropriate details
are stored in a 'struct mca_record' (defined in <machine/mca.h>).
The "hw.mca.count" is a count of such records and each record may
be queried via the "hw.mca.records" tree by specifying the record
index (0 .. count - 1) as the next name in the MIB similar to using
PIDs with the kern.proc.* sysctls. The idea is to export machine
check events to userland for more detailed processing.
- The periodic timer and hw.mca sysctls are only present if the CPU
supports MCA.
Discussed with: emaste (briefly)
MFC after: 1 month
I really don't want any pieces of code to include ioctl_compat.h, so let
the ibcs2 and svr4 compat leave sgtty alone. If they want to support
sgtty, they should emulate it on top of termios, not sgtty.
The code has been marked with BURN_BRIDGES for a long time. ibcs2 and
svr4 are not really popular pieces of code anyway.
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.
This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.
The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.
The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.
This change also introduces a DDB subcommand to show the list of all
vnet instances.
Approved by: julian (mentor)
to module builds. This avoids having to have the module builds walk up
the tree to find the kernel sources. It also allows a kernel + module
build to succeed when a new level of module subdirectories is added without
requiring that the /usr/share/mk/bsd.kmod.mk file on the machine be patched.
MFC after: 1 week
Broadcom BCM43xx chipsets. This driver uses the v3 firmware that
needs to be fetched separately. A port will be committed to create
the bwi firmware module.
The driver matches the following chips: Broadcom BCM4301, BCM4307,
BCM4306, BCM4309, BCM4311, BCM4312, BCM4318, BCM4319
The driver works for 802.11b and 802.11g.
Limitations:
This doesn't support the 802.11a or 802.11n portion of radios.
Some BCM4306 and BCM4309 cards don't work with Channel 1, 2 or 3.
Documenation for this firmware is reverse engineered from
http://bcm.sipsolutions.net/
V4 of the firmware is needed for 11a or 11n support
http://bcm-v4.sipsolutions.net/
Firmware needs to be fetched from a third party, port to be committed
# I've tested this with a BCM4319 mini-pci and a BCM4318 CardBus card, and
# not connected it to the build until the firmware port is committed.
Obtained from: DragonFlyBSD, //depot/projects/vap
Reviewed by: sam@, thompsa@
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
controller in the VIA southbridge functional in the CDS
(Configurable Development System) for MPC85XX.
The embedded USB controllers look operational but the
interrupt steering is still wrong.
pointers to "dead" implementations that no-op rather than invoking
the device driver. This would generally be unexpected and
possibly quite badly handled by most device drivers after
if_detach() has completed.
Reviewed by: bms
MFC after: 3 weeks
- Add and use mapping of fcntl(2) commands to new BSM constant space.
- Adopt (int) rather than (long) arguments to a number of auditon(2)
commands, as has happened in Solaris, and add compatibility code to
handle the old comments.
Note that BSM_PF_IEEE80211 is partially but not fully removed, as the
userspace OpenBSM 1.1alpha5 code still depends on it. Once userspace
is updated, I'll GCC the kernel constant.
MFC after: 2 weeks
Sponsored by: Apple, Inc.
Obtained from: TrustedBSD Project
Portions submitted by: sson
as well as providing stateful load balancing when used with RADIX_MPATH.
- Currently compiled in to i386 and amd64 but disabled by default, it can be enabled at
runtime with 'sysctl net.inet.flowtable.enable=1'.
- Embedded users can remove it entirely from the kernel by adding 'nooption FLOWTABLE' to
their kernel config files.
- A minimal hookup will be added to ip_output in a subsequent commit. I would like to see
more review before bringing in changes that require more churn.
Supported by: Bitgravity Inc.
naming of the partitions (GEOM_PART_EBR_COMPAT). When
compatibility is enabled, changes to the partitioning are
disallowed.
Remove the device name aliasing added previously to provide
backward compatibility, but which in practice doesn't give
us anything.
Enable compatibility on amd64 and i386.
in sys/nfs/nfs_nfssvc.c by registering with it using the
nfsd_call_nfsserver function pointer. Also, add the build glue for
nfs_nfssvc.c optionally based on "nfsserver" and also as a loadable
module.
Submitted by: rmacklem
Reviewed by: kib
Approved by: kib (mentor)
on a generic dumper that creates an ELF core file and
uses PMAP functions to scan and iterate over memory
chunks, as well as handle memory mappings used during
dumping.
the PMAP layer can choose to return physical memory
chunks or virtual memory chunks. For minidumps, the
chunks should be virtual.
The default MMU I/F implementation for the scan_md()
method returns NULL. Thus, when a PMAP implementation
does not implement the required methods, an empty
core file is created. Here, empty means having an ELF
header only.
Obtained from: Juniper Networks
provided, for example, on the PowerPC 970 (G5), as well as on related CPUs
like the POWER3 and POWER4.
This also adds support for various built-in hardware found on Apple G5
hardware (e.g. the IBM CPC925 northbridge).
Reviewed by: grehan
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.
Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).
Implement support for TSS i/o port access permission bitmap for amd64.
Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.
Implement amd64-specific compat shims for sysarch.
Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.
TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.
In collaboration with: pho
Discussed with: peter
Reviewed by: jhb
Linuxolator tested by: dchagin