UPDATING by URL.
As there has been some confusion over the need to run "mergemaster -p",
part of our standard upgrade procedure, following the recent addition of
an "auditdistd" user, add a note about it to UPDATING explicitly.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.
Suggested by: andre
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.
After this change a packet processed by the stack isn't
modified at all[2] except for TTL.
After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.
[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.
[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.
Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>
now use function calls:
if_clone_simple()
if_clone_advanced()
to initialize a cloner, instead of macros that initialize if_clone
structure.
Discussed with: brooks, bz, 1 year ago
sdchi encapsulates a generic SD Host Controller logic that relies on
actual hardware driver for register access.
sdhci_pci implements driver for PCI SDHC controllers using new SDHCI
interface
No kernel config modifications are required, but if you load sdhc
as a module you must switch to sdhci_pci instead.
This has been developed during 2 summer of code mandates and being revived
by gnn recently.
The functionality in this commit mirrors entirely content of fusefs-kmod
port, which doesn't need to be installed anymore for -CURRENT setups.
In order to get some sparse technical notes, please refer to:
http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html
or to the project branch:
svn://svn.freebsd.org/base/projects/fuse/
which also contains granular history of changes happened during port
refinements. This commit does not came from the branch reintegration
itself because it seems svn is not behaving properly for this functionaly
at the moment.
Partly Sponsored by: Google, Summer of Code program 2005, 2011
Originally submitted by: ilya, Csaba Henk <csaba-ml AT creo DOT hu >
In collabouration with: pho
Tested by: flo, gnn, Gustau Perez,
Kevin Oberman <rkoberman AT gmail DOT com>
MFC after: 2 months
- All packets in NETISR_IP queue are in net byte order.
- ip_input() is entered in net byte order and converts packet
to host byte order right _after_ processing pfil(9) hooks.
- ip_output() is entered in host byte order and converts packet
to net byte order right _before_ processing pfil(9) hooks.
- ip_fragment() accepts and emits packet in net byte order.
- ip_forward(), ip_mloopback() use host byte order (untouched actually).
- ip_fastforward() no longer modifies packet at all (except ip_ttl).
- Swapping of byte order there and back removed from the following modules:
pf(4), ipfw(4), enc(4), if_bridge(4).
- Swapping of byte order added to ipfilter(4), based on __FreeBSD_version
- __FreeBSD_version bumped.
- pfil(9) manual page updated.
Reviewed by: ray, luigi, eri, melifaro
Tested by: glebius (LE), ray (BE)
all diskN aliases for providers (which more or less corresponds to how the
x86 version behaves) but instead probe only those listed in the boot-device
OFW environment variable. This has the following advantages:
- avoids otherwise unavoidable OFW warnings about failures to open disks
for which aliases exist but no actual hardware is connected
- avoids issues due to different diskN naming schemes
- aligns us with Solaris
MFC after: 3 days
This makes our naming scheme more closely match other systems and the
expectations of much third-party software. MIPS builds which are little-endian
should require and exhibit no changes. Big-endian TARGET_ARCHes must be
changed:
From: To:
mipseb mips
mipsn32eb mipsn32
mips64eb mips64
An entry has been added to UPDATING and some foot-shooting protection (complete
with warnings which should become errors in the near future) to the top-level
base system Makefile.
platforms.
This will make every attempt to mount a non-mpsafe filesystem to the
kernel forbidden, unless it is expressely compiled with
VFS_ALLOW_NONMPSAFE option.
This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.
No MFC is expected for this patch.
operations for setting and accessing vnode's v_socket field.
The operations are necessary to implement proper unix socket handling
on layered file systems like nullfs(5).
This change fixes the long standing issue with nullfs(5) being in that
unix sockets did not work between lower and upper layers: if we bound
to a socket on the lower layer we could connect only to the lower
path; if we bound to the upper layer we could connect only to the
upper path. The new behavior is one can connect to both the lower and
the upper paths regardless what layer path one binds to.
PR: kern/51583, kern/159663
Suggested by: kib
Reviewed by: arch
MFC after: 2 weeks
conditional code parts not used by or applicable to FreeBSD.
The new implementation is supposed to be able to cope with changes to
the 'l' versions of the msghdr structs now used as well as to if_data
allowing future changes without breaking things.
This restores carp(4) config support in HEAD after r231504.
Reviewed by: glebius, brooks
MFC After: 3 months
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
all the architectures.
The option allows to mount non-MPSAFE filesystem. Without it, the
kernel will refuse to mount a non-MPSAFE filesytem.
This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.
No MFC is expected for this patch.
Tested by: gianni
Reviewed by: kib
replace amd(4) with the former in the amd64, i386 and pc98 GENERIC kernel
configuration files. Besides duplicating functionality, amd(4), which
previously also supported the AMD Am53C974, unlike esp(4) is no longer
maintained and has accumulated enough bit rot over time to always cause
a panic during boot as long as at least one target is attached to it
(see PR 124667).
PR: 124667
Obtained from: NetBSD (based on)
MFC after: 3 days
digit beyond your time.
Various sysinstall dependencies (e.g. libftpio, libdisk, libodialog, etc.)
will be cleaned up in coming days. Some will take longer than others due to
a few other consumers (tzsetup and sade).
on vfc_name to set vfc_typenum, so that vfc_typenum doesn't
change when file systems are loaded in different orders. This
keeps NFS file handles from changing, for file systems that
use vfc_typenum in their fsid. This change is controlled via
a loader.conf variable called vfs.typenumhash, since vfc_typenum
will change once when this is enabled. It defaults to 1 for
9.0, but will default to 0 when MFC'd to stable/8.
Tested by: hrs
Reviewed by: jhb, pjd (earlier version)
Approved by: re (kib)
MFC after: 1 month
also capability-related changes to fget(9). This is likely not part of
a formal KPI, but the nvidia driver (at least) uses it.
Mention /dev/{stdin,stdout,stderr} breakage that appears in certain
kernel revisions as best avoided!
Approved by: re (xxx)
The code has definitely been broken for SCHED_ULE, which is a default
scheduler. It may have been broken for SCHED_4BSD in more subtle ways,
e.g. with manually configured CPU affinities and for interrupt devilery
purposes.
We still provide a way to disable individual CPUs or all hyperthreading
"twin" CPUs before SMP startup. See the UPDATING entry for details.
Interaction between building CPU topology and disabling CPUs still
remains fuzzy: topology is first built using all availble CPUs and then
the disabled CPUs should be "subtracted" from it. That doesn't work
well if the resulting topology becomes non-uniform.
This work is done in cooperation with Attilio Rao who in addition to
reviewing also provided parts of code.
PR: kern/145385
Discussed with: gcooper, ambrisko, mdf, sbruno
Reviewed by: attilio
Tested by: pho, pluknet
X-MFC after: never
family detection in world, mostly noticed by ifconfig(8), when running with
an old kernel.
Reported by: Andrzej Tobola (ato iem.pw.edu.pl)
Reported by: gcooper
Some files keep the SUN4V tags as a code reference, for the future,
if any rewamped sun4v support wants to be added again.
Reviewed by: marius
Tested by: sbruno
Approved by: re
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
- To know what behavior to mimic, restore ATA_STATIC_ID option in cases
where it was present before.
- Add some more details to UPDATING.
stack. It means that all legacy ATA drivers are disabled and replaced by
respective CAM drivers. If you are using ATA device names in /etc/fstab or
other places, make sure to update them respectively (adX -> adaY,
acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential
numbers for each type in order of detection, unless configured otherwise
with tunables, see cam(4)).
ataraid(4) functionality is now supported by the RAID GEOM class.
To use it you can load geom_raid kernel module and use graid(8) tool
for management. Instead of /dev/arX device names, use /dev/raid/rX.
referred to as the experimental server. It also adds a new command
line option "-o" to both mountd and nfsd that forces them to use the
old/regular NFS server. The "-e" option for these commands is now
a no-op, since the new server is the default. I will be committing rc
script and man changes soon. Discussed on freebsd-fs@.
x86 CPU support, better support for powerpc64, some new directives, and
many other things. Bump __FreeBSD_version, and add a note to UPDATING.
Thanks to the many people that have helped to test this.
Obtained from: projects/binutils-2.17
misnamed since it was introduced and should not be globally exposed
with this name. The equivalent functionality is now available using
kern_yield(curthread->td_user_pri). The function remains
undocumented.
Bump __FreeBSD_version.
The code is turned off until the tree is fixed up so it compiles.
__FreeBSD_version was already bumped once today, so skip the bump, but
add an entry to UPDATING.
Note that __DESCR() is used in the SYSCTL_OID() macro and so is not
needed in macros that invoke it. This use was inconsistent in the
file and I have made it consistent any lines already being changed.
Reviewed by: bde (previous version), -arch (previous version)
dialog is distributed from GPLv2 to LGPLv2 and introduces a number of new
features and a new and better libdialog API. The existing libdialog will
be kept temporarily as libodialog for compatibility purposes until sade,
sysinstall and tzsetup have been either updated or replaced.
__FreeBSD_version is now 900030.
Discussed on: -current
Approved by: core
Obtained from: http://invisible-island.net/dialog
access inbound/outbound events and associated data for established TCP
connections. The hooks only run if at least one hook function is registered
for the hook point, ensuring the impact on the stack is effectively nil when
no TCP Khelp modules are loaded. struct tcp_hhook_data is passed as contextual
data to any registered Khelp module hook functions.
- Add an OSD (Object Specific Data) pointer to struct tcpcb to allow Khelp
modules to associate per-connection data with the TCP control block.
- Bump __FreeBSD_version and add a note to UPDATING regarding to ABI changes
introduced by this commit and r216753.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz, others along the way
MFC after: 3 months
it before, your rc scripts may still reference old files/directories and
if you are in the unlucky situation to have triggered a reboot (intentionally
or not) between the delete-old run and the mergemaster, your system may not
start anymore.
While I'm here, give a hint about delete-old-libs.
Noticed by: bcr (luckily in a discussion and not by getting hit by this)
MFC after: 1 week
support in mii(4):
- Merge generic flow control advertisement (which can be enabled by
passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from
NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD,
IFM_FLOW isn't implemented as a global option via the "don't care
mask" but instead as a media specific option this. This has the
following advantages:
o allows flow control advertisement with autonegotiation to be
turned on and off via ifconfig(8) with the default typically
being off (though MIIF_FORCEPAUSE has been added causing flow
control to be always advertised, allowing to easily MFC this
changes for drivers that previously used home-grown support for
flow control that behaved that way without breaking POLA)
o allows to deal with PHY drivers where flow control advertisement
with manual selection doesn't work or at least isn't implemented,
like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4),
by setting MIIF_NOMANPAUSE
o the available combinations of media options are readily available
from the `ifconfig -m` output
- Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE
and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so
these are understood by ifconfig(8).
o Make the master/slave support in mii(4) actually usable:
- Change IFM_ETH_MASTER from being implemented as a global option via
the "don't care mask" to a media specific one as it actually is only
applicable to IFM_1000_T to date.
- Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to
actually configure manually selected slave mode (like we also do in
the PHY specific implementations).
- Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it
is understood by ifconfig(8).
o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4),
e1000phy(4) and ip1000phy(4) to use the generic flow control support
instead of home-grown solutions via IFM_FLAGs. This includes changing
these PHY drivers and smcphy(4) to no longer unconditionally advertise
support for flow control but only if the selected media has IFM_FLOW
set (or MIIF_FORCEPAUSE is set) and implemented for these media variants,
i.e. typically only for copper.
o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and
set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0
and some IFM_FLAGn.
o Switch brgphy(4) to add at least the the supported copper media based on
the contents of the BMSR via mii_phy_add_media() instead of hardcoding
them. The latter approach seems to have developed historically, besides
causing unnecessary code duplication it was also undesirable because
brgphy_mii_phy_auto() already based the capability advertisement on the
contents of the BMSR though.
o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not
just BCM5701. Apparently this was a misinterpretation of a workaround
in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and
BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but
this doesn't mean we can't set these as well on other PHYs for manual
media selection.
o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so
IFM_1000_T master mode support now is generally available with all PHY
drivers.
o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's
not applicable there.
Reviewed by: yongari (plus additional testing)
Obtained from: NetBSD (partially), OpenBSD (partially)
MFC after: 2 weeks
Control Algorithms for FreeBSD" FreeBSD Foundation funded project. More details
about the project are available at: http://caia.swin.edu.au/freebsd/5cc/
- Add a KPI and supporting infrastructure to allow modular congestion control
algorithms to be used in the net stack. Algorithms can maintain per-connection
state if required, and connections maintain their own algorithm pointer, which
allows different connections to concurrently use different algorithms. The
TCP_CONGESTION socket option can be used with getsockopt()/setsockopt() to
programmatically query or change the congestion control algorithm respectively
from within an application at runtime.
- Integrate the framework with the TCP stack in as least intrusive a manner as
possible. Care was also taken to develop the framework in a way that should
allow integration with other congestion aware transport protocols (e.g. SCTP)
in the future. The hope is that we will one day be able to share a single set
of congestion control algorithm modules between all congestion aware transport
protocols.
- Introduce a new congestion recovery (TF_CONGRECOVERY) state into the TCP stack
and use it to decouple the meaning of recovery from a congestion event and
recovery from packet loss (TF_FASTRECOVERY) a la RFC2581. ECN and delay based
congestion control protocols don't generally need to recover from packet loss
and need a different way to note a congestion recovery episode within the
stack.
- Remove the net.inet.tcp.newreno sysctl, which simplifies some portions of code
and ensures the stack always uses the appropriate mechanisms for recovering
from packet loss during a congestion recovery episode.
- Extract the NewReno congestion control algorithm from the TCP stack and
massage it into module form. NewReno is always built into the kernel and will
remain the default algorithm for the forseeable future. Implementations of
additional different algorithms will become available in the near future.
- Bump __FreeBSD_version to 900025 and note in UPDATING that rebuilding code
that relies on the size of "struct tcpcb" is required.
Many thanks go to the Cisco University Research Program Fund at Community
Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work
at the Centre for Advanced Internet Architectures, Swinburne University of
Technology is greatly appreciated.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: Cisco URP, FreeBSD Foundation
Reviewed by: rpaulo
Tested by: David Hayes (and many others over the years)
MFC after: 3 months
The $ip6addrctl_policy is a variable to choose a pre-defined address
selection policy set by ip6addrctl(8).
The keyword "ipv4_prefer" sets IPv4-preferred one described in Section 10.3,
the keyword "ipv6_prefer" sets IPv6-preferred one in Section 2.1 in RFC 3484,
respectively. When "AUTO" is specified, it attempts to read
/etc/ip6addrctl.conf first. If it is found, it reads and installs it as
a policy table. If not, either of the two pre-defined policy tables is
chosen automatically according to $ipv6_activate_all_interfaces.
When $ipv6_activate_all_interfaces=NO, interfaces which have no corresponding
$ifconfig_IF_ipv6 is marked as IFDISABLED for security reason.
The default values are ip6addrctl_policy=AUTO and
ipv6_activate_all_interfaces=NO.
Discussed with: ume and bz
Deliverables: Small and clean code (1,4 KSLOC vs GNU's 8,5 KSLOC),
lower memory usage than GNU grep, GNU compatibility,
BSD license.
TODO: Performance is somewhat behind GNU grep but it is only
significant for bigger searches. The reason is complex, the
most important factor is that GNU grep uses lots of
optimizations to improve the speed of the regex library.
First, we need a modern regex library (practically by adopting
TRE), add support for GNU-style non-standard regexes and then
reevalute the performance issues and look for bottlenecks. In
the meantime, for those, who need better performance, it is
possible to build GNU grep by setting WITH_GNU_GREP.
Approved by: delphij (mentor)
Obtained from: OpenBSD (http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/grep/),
freegrep (http://github.com/howardjp/freegrep)
Sponsored by: Google SoC 2008
Portbuild tests run by: kris, pav, erwin
Acknowledgements to: fjoe (as SoC 2008 mentor),
everyone who helped in reviewing and testing
Kernel sources for 64-bit PowerPC, along with build-system changes to keep
32-bit kernels compiling (build system changes for 64-bit kernels are
coming later). Existing 32-bit PowerPC kernel configurations must be
updated after this change to specify their architecture.
in Solaris 10 updates 141445-09 and 142901-14.
Detailed information:
(OpenSolaris revisions and Bug IDs, Solaris 10 patch numbers)
7844:effed23820ae
6755435 zfs_open() and zfs_close() needs to use ZFS_ENTER/ZFS_VERIFY_ZP (141445-01)
7897:e520d8258820
6748436 inconsistent zpool.cache in boot_archive could panic a zfs root filesystem upon boot-up (141445-01)
7965:b795da521357
6740164 zpool attach can create an illegal root pool (141909-02)
8084:b811cc60d650
6769612 zpool_import() will continue to write to cachefile even if altroot is set (N/A)
8121:7fd09d4ebd9c
6757430 want an option for zdb to disable space map loading and leak tracking (141445-01)
8129:e4f45a0bfbb0
6542860 ASSERT: reason != VDEV_LABEL_REMOVE||vdev_inuse(vd, crtxg, reason, 0) (141445-01)
8188:fd00c0a81e80
6761100 want zdb option to select older uberblocks (141445-01)
8190:6eeea43ced42
6774886 zfs_setattr() won't allow ndmp to restore SUNWattr_rw (141445-01)
8225:59a9961c2aeb
6737463 panic while trying to write out config file if root pool import fails (141445-01)
8227:f7d7be9b1f56
6765294 Refactor replay (141445-01)
8228:51e9ca9ee3a5
6572357 libzfs should do more to avoid mnttab lookups (141909-01)
6572376 zfs_iter_filesystems and zfs_iter_snapshots get objset stats twice (141909-01)
8241:5a60f16123ba
6328632 zpool offline is a bit too conservative (141445-01)
6739487 ASSERT: txg <= spa_final_txg due to scrub/export race (141445-01)
6767129 ASSERT: cvd->vdev_isspare, in spa_vdev_detach() (141445-01)
6747698 checksum failures after offline -t / export / import / scrub (141445-01)
6745863 ZFS writes to disk after it has been offlined (141445-01)
6722540 50% slowdown on scrub/resilver with certain vdev configurations (141445-01)
6759999 resilver logic rewrites ditto blocks on both source and destination (141445-01)
6758107 I/O should never suspend during spa_load() (141445-01)
6776548 codereview(1) runs off the page when faced with multi-line comments (N/A)
6761406 AMD errata 91 workaround doesn't work on 64-bit systems (141445-01)
8242:e46e4b2f0a03
6770866 GRUB/ZFS should require physical path or devid, but not both (141445-01)
8269:03a7e9050cfd
6674216 "zfs share" doesn't work, but "zfs set sharenfs=on" does (141445-01)
6621164 $SRC/cmd/zfs/zfs_main.c seems to have a syntax error in the translation note (141445-01)
6635482 i18n problems in libzfs_dataset.c and zfs_main.c (141445-01)
6595194 "zfs get" VALUE column is as wide as NAME (141445-01)
6722991 vdev_disk.c: error checking for ddi_pathname_to_dev_t() must test for NODEV (141445-01)
6396518 ASSERT strings shouldn't be pre-processed (141445-01)
8274:846b39508aff
6713916 scrub/resilver needlessly decompress data (141445-01)
8343:655db2375fed
6739553 libzfs_status msgid table is out of sync (141445-01)
6784104 libzfs unfairly rejects numerical values greater than 2^63 (141445-01)
6784108 zfs_realloc() should not free original memory on failure (141445-01)
8525:e0e0e525d0f8
6788830 set large value to reservation cause core dump (141445-01)
6791064 want sysevents for ZFS scrub (141445-01)
6791066 need to be able to set cachefile on faulted pools (141445-01)
6791071 zpool_do_import() should not enable datasets on faulted pools (141445-01)
6792134 getting multiple properties on a faulted pool leads to confusion (141445-01)
8547:bcc7b46e5ff7
6792884 Vista clients cannot access .zfs (141445-01)
8632:36ef517870a3
6798384 It can take a village to raise a zio (141445-01)
8636:7e4ce9158df3
6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage() (141909-01)
6504953 zfs_getpage() misunderstands VOP_GETPAGE() interface (141909-01)
6702206 ZFS read/writer lock contention throttles sendfile() benchmark (141445-01)
6780491 Zone on a ZFS filesystem has poor fork/exec performance (141445-01)
6747596 assertion failed: DVA_EQUAL(BP_IDENTITY(&zio->io_bp_orig), BP_IDENTITY(zio->io_bp))); (141445-01)
8692:692d4668b40d
6801507 ZFS read aggregation should not mind the gap (141445-01)
8697:e62d2612c14d
6633095 creating a filesystem with many properties set is slow (141445-01)
8768:dfecfdbb27ed
6775697 oracle crashes when overwriting after hitting quota on zfs (141909-01)
8811:f8deccf701cf
6790687 libzfs mnttab caching ignores external changes (141445-01)
6791101 memory leak from libzfs_mnttab_init (141445-01)
8845:91af0d9c0790
6800942 smb_session_create() incorrectly stores IP addresses (N/A)
6582163 Access Control List (ACL) for shares (141445-01)
6804954 smb_search - shortname field should be space padded following the NULL terminator (N/A)
6800184 Panic at smb_oplock_conflict+0x35() (N/A)
8876:59d2e67b4b65
6803822 Reboot after replacement of system disk in a ZFS mirror drops to grub> prompt (141445-01)
8924:5af812f84759
6789318 coredump when issue zdb -uuuu poolname/ (141445-01)
6790345 zdb -dddd -e poolname coredump (141445-01)
6797109 zdb: 'zdb -dddddd pool_name/fs_name inode' coredump if the file with inode was deleted (141445-01)
6797118 zdb: 'zdb -dddddd poolname inum' coredump if I miss the fs name (141445-01)
6803343 shareiscsi=on failed, iscsitgtd failed request to share (141445-01)
9030:243fd360d81f
6815893 hang mounting a dataset after booting into a new boot environment (141445-01)
9056:826e1858a846
6809691 'zpool create -f' no longer overwrites ufs infomation (141445-01)
9179:d8fbd96b79b3
6790064 zfs needs to determine uid and gid earlier in create process (141445-01)
9214:8d350e5d04aa
6604992 forced unmount + being in .zfs/snapshot/<snap1> = not happy (141909-01)
6810367 assertion failed: dvp->v_flag & VROOT, file: ../../common/fs/gfs.c, line: 426 (141909-01)
9229:e3f8b41e5db4
6807765 ztest_dsl_dataset_promote_busy needs to clean up after ENOSPC (141445-01)
9230:e4561e3eb1ef
6821169 offlining a device results in checksum errors (141445-01)
6821170 ZFS should not increment error stats for unavailable devices (141445-01)
6824006 need to increase issue and interrupt taskqs threads in zfs (141445-01)
9234:bffdc4fc05c4
6792139 recovering from a suspended pool needs some work (141445-01)
6794830 reboot command hangs on a failed zfs pool (141445-01)
9246:67c03c93c071
6824062 System panicked in zfs_mount due to NULL pointer dereference when running btts and svvs tests (141909-01)
9276:a8a7fc849933
6816124 System crash running zpool destroy on broken zpool (141445-03)
9355:09928982c591
6818183 zfs snapshot -r is slow due to set_snap_props() doing txg_wait_synced() for each new snapshot (141445-03)
9391:413d0661ef33
6710376 log device can show incorrect status when other parts of pool are degraded (141445-03)
9396:f41cf682d0d3 (part already merged)
6501037 want user/group quotas on ZFS (141445-03)
6827260 assertion failed in arc_read(): hdr == pbuf->b_hdr (141445-03)
6815592 panic: No such hold X on refcount Y from zfs_znode_move (141445-03)
6759986 zfs list shows temporary %clone when doing online zfs recv (141445-03)
9404:319573cd93f8
6774713 zfs ignores canmount=noauto when sharenfs property != off (141445-03)
9412:4aefd8704ce0
6717022 ZFS DMU needs zero-copy support (141445-03)
9425:e7ffacaec3a8
6799895 spa_add_spares() needs to be protected by config lock (141445-03)
6826466 want to post sysevents on hot spare activation (141445-03)
6826468 spa 'allowfaulted' needs some work (141445-03)
6826469 kernel support for storing vdev FRU information (141445-03)
6826470 skip posting checksum errors from DTL regions of leaf vdevs (141445-03)
6826471 I/O errors after device remove probe can confuse FMA (141445-03)
6826472 spares should enjoy some of the benefits of cache devices (141445-03)
9443:2a96d8478e95
6833711 gang leaders shouldn't have to be logical (141445-03)
9463:d0bd231c7518
6764124 want zdb to be able to checksum metadata blocks only (141445-03)
9465:8372081b8019
6830237 zfs panic in zfs_groupmember() (141445-03)
9466:1fdfd1fed9c4
6833162 phantom log device in zpool status (141445-03)
9469:4f68f041ddcd
6824968 add ZFS userquota support to rquotad (141445-03)
9470:6d827468d7b5
6834217 godfather I/O should reexecute (141445-03)
9480:fcff33da767f
6596237 Stop looking and start ganging (141909-02)
9493:9933d599bc93
6623978 lwb->lwb_buf != NULL, file ../../../uts/common/fs/zfs/zil.c, line 787, function zil_lwb_commit (141445-06)
9512:64cafcbcc337
6801810 Commit of aligned streaming rewrites to ZIL device causes unwanted disk reads (N/A)
9515:d3b739d9d043
6586537 async zio taskqs can block out userland commands (142901-09)
9554:787363635b6a
6836768 zfs_userspace() callback has no way to indicate failure (N/A)
9574:1eb6a6ab2c57
6838062 zfs panics when an error is encountered in space_map_load() (141909-02)
9583:b0696cd037cc
6794136 Panic BAD TRAP: type=e when importing degraded zraid pool. (141909-03)
9630:e25a03f552e0
6776104 "zfs import" deadlock between spa_unload() and spa_async_thread() (141445-06)
9653:a70048a304d1
6664765 Unable to remove files when using fat-zap and quota exceeded on ZFS filesystem (141445-06)
9688:127be1845343
6841321 zfs userspace / zfs get userused@ doesn't work on mounted snapshot (N/A)
6843069 zfs get userused@S-1-... doesn't work (N/A)
9873:8ddc892eca6e
6847229 assertion failed: refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite in dmu_tx.c (141445-06)
9904:d260bd3fd47c
6838344 kernel heap corruption detected on zil while stress testing (141445-06)
9951:a4895b3dd543
6844900 zfs_ioc_userspace_upgrade leaks (N/A)
10040:38b25aeeaf7a
6857012 zfs panics on zpool import (141445-06)
10000:241a51d8720c
6848242 zdb -e no longer works as expected (N/A)
10100:4a6965f6bef8
6856634 snv_117 not booting: zfs_parse_bootfs: error2 (141445-07)
10160:a45b03783d44
6861983 zfs should use new name <-> SID interfaces (N/A)
6862984 userquota commands can hang (141445-06)
10299:80845694147f
6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (N/A)
10302:a9e3d1987706
6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (fix lint) (N/A)
10575:2a8816c5173b (partial merge)
6882227 spa_async_remove() shouldn't do a full clear (142901-14)
10800:469478b180d9
6880764 fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached (142901-09)
6793430 zdb -ivvvv assertion failure: bp->blk_cksum.zc_word[2] == dmu_objset_id(zilog->zl_os) (N/A)
10801:e0bf032e8673 (partial merge)
6822816 assertion failed: zap_remove_int(ds_next_clones_obj) returns ENOENT (142901-09)
10810:b6b161a6ae4a
6892298 buf->b_hdr->b_state != arc_anon, file: ../../common/fs/zfs/arc.c, line: 2849 (142901-09)
10890:499786962772
6807339 spurious checksum errors when replacing a vdev (142901-13)
11249:6c30f7dfc97b
6906110 bad trap panic in zil_replay_log_record (142901-13)
6906946 zfs replay isn't handling uid/gid correctly (142901-13)
11454:6e69bacc1a5a
6898245 suspended zpool should not cause rest of the zfs/zpool commands to hang (142901-10)
11546:42ea6be8961b (partial merge)
6833999 3-way deadlock in dsl_dataset_hold_ref() and dsl_sync_task_group_sync() (142901-09)
Discussed with: pjd
Approved by: delphij (mentor)
Obtained from: OpenSolaris (multiple Bug IDs)
MFC after: 2 months
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
are some problems with static executables), make.conf (would also
affect ports which do not use GNU make and do not override the
compile targets) or in the kernel config (via "makeoptions
WITH_CTF=yes").
Additional (related) changes:
- propagate WITH_CTF to module builds
- do not add -g to the linker flags, it's a noop there anyway
(at least according to the man page of ld)
- do not add -g to CFLAGS unconditionally
we need to have a look if it is really needed (IMO not) or if there
is a way to add it only when WITH_CTF is used
Note: ctfconvert / ctfmerge lines will not appear in the build output,
to protect the innocent (those which do not build with WITH_CTF would
see the shell-test and may think WITH_CTF is used).
Reviewed by: imp, jhb, scottl (earlier version)
Discussed on: arch@
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.
Reviewed by: kib, jhb
This utility allows users to convert their wtmp databases to the new
format. It makes no sense for users to keep their wtmp log files if they
are unable to view them.
It basically copies ut_line into ut_id as well. This makes it possible
for last(1) and ac(8) to match login records with their corresponding
logout record.
While the name is pretentious, a good explanation of its targets is
reported in this 17 months old presentation e-mail:
http://lists.freebsd.org/pipermail/freebsd-arch/2008-August/008452.html
In order to implement it, the sq_type in sleepqueues is mandatory and not
only compiled along with INVARIANTS option. Additively, a new sleepqueue
function, sleepq_type() is added, returning the type of the sleepqueue
linked to a wchan.
Three new sysctls are added in order to configure the thread:
debug.deadlkres.slptime_threshold
debug.deadlkres.blktime_threshold
debug.deadlkres.sleepfreq
rappresenting the thresholds for sleep and block time that will lead to
a deadlock matching (when exceeded), while the sleepfreq rappresents the
number of seconds between 2 consecutive thread runnings.
In order to enable the deadlock resolver thread recompile your kernel
with the option DEADLKRES.
Reviewed by: jeff
Tested by: pho, Giovanni Trematerra
Sponsored by: Nokia Incorporated, Sandvine Incorporated
MFC after: 2 weeks
Right now syscons(4) uses a cons25-style terminal emulator. The
disadvantages of that are:
- Little compatibility with embedded devices with serial interfaces.
- Bad bandwidth efficiency, mainly because of the lack of scrolling
regions.
- A very hard transition path to support for modern character sets like
UTF-8.
Our terminal emulation library, libteken, has been supporting
xterm-style terminal emulation for months, so flip the switch and make
everyone use an xterm-style console driver.
I still have to enable this on i386. Right now pc98 and i386 share the
same /etc/ttys file. I'm not going to switch pc98, because it uses its
own Kanji-capable cons25 emulator.
IMPORTANT: What to do if things go wrong (i.e. graphical artifacts):
- Run the application inside script(1), try to reduce the problem and
send me the log file.
- In the mean time, you can run `vidcontrol -T cons25' and `export
TERM=cons25' so you can run applications the same way you did before.
You can also build your kernel with `options TEKEN_CONS25' to make all
virtual terminals use the cons25 emulator by default.
Discussed on: current@
re-add $ipv6_enable support for backward compatibility. From
UPDATING:
1. To use IPv6, simply define $ifconfig_IF_ipv6 like $ifconfig_IF
for IPv4. For aliases, $ifconfig_IF_aliasN should be used.
Note that both variables need the "inet6" keyword at the head.
Do not set $ipv6_network_interfaces manually if you do not
understand what you are doing. It is not needed in most cases.
$ipv6_ifconfig_IF and $ipv6_ifconfig_IF_aliasN still work, but
they are obsolete.
2. $ipv6_enable is obsolete. Use $ipv6_prefer and/or
"inet6 accept_rtadv" keyword in ifconfig(8) instead.
If you define $ipv6_enable=YES, it means $ipv6_prefer=YES and
all configured interfaces have "inet6 accept_rtadv" in the
$ifconfig_IF_ipv6. These are for backward compatibility.
3. A new variable $ipv6_prefer has been added. If NO, IPv6
functionality of interfaces with no corresponding
$ifconfig_IF_ipv6 is disabled by using "inet6 ifdisabled" flag,
and the default address selection policy of ip6addrctl(8)
is the IPv4-preferred one (see rc.d/ip6addrctl for more details).
Note that if you want to configure IPv6 functionality on the
disabled interfaces after boot, first you need to clear the flag by
using ifconfig(8) like:
ifconfig em0 inet6 -ifdisabled
If YES, the default address selection policy is set as
IPv6-preferred.
The default value of $ipv6_prefer is NO.
4. If your system need to receive Router Advertisement messages,
define "inet6 accept_rtadv" in $ifconfig_IF_ipv6. The rc(8)
scripts automatically invoke rtsol(8) when the interface becomes
UP. The Router Advertisement messages are used for SLAAC
(State-Less Address AutoConfiguration).
df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows
ZFS route of not listing snapshots by default with 'zfs list' command.
- Add UPDATING entry to note that ZFS snapshots are no longer visible in
mount(8) and df(1) output by default.
Reviewed by: kib
MFC after: 3 days
o remove all entries before RELENG_7 was branched, as is tradition[*].
o Update examples... nobody cares about 5.x upgrades.
o minor format tweaking in a few places.
o update copyright (although at best I hold an editors copyright these days).
o Remove giving people permission to buy me beer. I don't do enough for
this document for that anymore...
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding
This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.
Please don't forget to update your config file with the STOP_NMI
option removal
Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)
preparation for 8.0-RELEASE. Add the previous version of those
libraries to ObsoleteFiles.inc and bump __FreeBSD_Version.
Reviewed by: kib
Approved by: re (rwatson)
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
the TCP syncache. This returns struct tcpopt to being private within the TCP
implementation, thus allowing it to be modified without ABI concerns.
The patch breaks the ABI. Bump __FreeBSD_version to 800103 accordingly. The cxgb
driver is the only TOE consumer affected by this change, and needs to be
recompiled along with the kernel.
Suggested by: rwatson
Reviewed by: rwatson, kmacy
Approved by: re (kensmith), kensmith (mentor temporarily unavailable)
back to the 8 branch:
tcp_var.h
- struct sackhint
- struct tcpcb
- struct tcpstat
The patch breaks the ABI. Bump __FreeBSD_version to 800102 accordingly. User
space tools that rely on the size of any of these structs (e.g. sockstat) need
to be recompiled.
Reviewed by: rpaulo, sam, andre, rwatson
Approved by: re & mentor (gnn)
little purpose and are unused in the base system.
The IOCTL functionality is entirely duplicated and routing sockets
provide a richer interface than the kqueue functionality.
Further, it is not practical for these devices to be made sensible in
the face of VIMAGE.
Bump __FreeBSD_version on the off chance that there is any code out
there that actually uses this stuff.
Reviewed by: rwatson
Discussed with: bz, zec
Approved by: re@ (kensmith)
FreeBSD docset during 'make release' this will speed up release
builds;
- sysinstall(8) has also been updated to use these packages with a new
menu allowing people to choose what localized doc to install;
- mention in UPDATING that docs from the FreeBSD Documentation project
are now installed in /usr/local/share/doc/freebsd instead of
/usr/share/doc.
Approved by: re (kensmith)
require COMPAT_FREEBSD7. Also, explicitly note in NOTES that any version
of COMPAT_FREEBSD<n> effectively requires for newer binaries (i.e.
COMPAT_FREEBSD<n+1>, etc.). While this has been true in practice
previously, it used to compile ok before the commit earlier this week.
Discussed with: peter
Approved by: re (kensmith)
Vimage module, which had been there already but now is stateful.
All variables are now file local; so this further limits the global
spreading of routing related things throughout the kernel.
Add a missing function local variable in case of MPATHing.
Reviewed by: zec
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively. (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)
The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer. Do the equivalent in
kinfo_proc.
Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively. Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary. In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.
Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups. When feasible, truncate
the group list rather than generating an error.
Minor changes:
- Reduce the number of hand rolled versions of groupmember().
- Do not assign to both cr_gid and cr_groups[0].
- Modify ipfw to cache ucreds instead of part of their contents since
they are immutable once referenced by more than one entity.
Submitted by: Isilon Systems (initial implementation)
X-MFC after: never
PR: bin/113398 kern/133867
Actually, as it did receive few tuning, the support is disabled by
default, but it can opt-in with the option ADAPTIVE_LOCKMGRS.
Due to the nature of lockmgrs, adaptive spinning needs to be
selectively enabled for any interested lockmgr.
The support is bi-directional, or, in other ways, it will work in both
cases if the lock is held in read or write way. In particular, the
read path is passible of further tunning using the sysctls
debug.lockmgr.retries and debug.lockmgr.loops . Ideally, such sysctls
should be axed or compiled out before release.
Addictionally note that adaptive spinning doesn't cope well with
LK_SLEEPFAIL. The reason is that many (and probabilly all) consumers
of LK_SLEEPFAIL are mainly interested in knowing if the interlock was
dropped or not in order to reacquire it and re-test initial conditions.
This directly interacts with adaptive spinning because lockmgr needs
to drop the interlock while spinning in order to avoid a deadlock
(further details in the comments inside the patch).
Final note: finding someone willing to help on tuning this with
relevant workloads would be either very important and appreciated.
Tested by: jeff, pho
Requested by: many
network stack when reentering the inbound path from netgraph, and
force queueing of mbufs at the outbound netgraph node.
The mechanism relies on two components. First, in netgraph nodes
where outbound path of the network stack calls into netgraph, the
current thread has to be appropriately marked using the new
NG_OUTBOUND_THREAD_REF() macro before proceeding to call further
into the netgraph topology, and unmarked using the
NG_OUTBOUND_THREAD_UNREF() macro before returning to the caller.
Second, netgraph nodes which can potentially reenter the network
stack in the inbound path have to mark their inbound hooks using
NG_HOOK_SET_TO_INBOUND() macro. The netgraph framework will then
detect when there is a danger of a call graph looping back from
outbound to inbound path via netgraph, and defer handing off the
mbufs to the "inbound" node to a worker thread with a clean stack.
In this first pass only the most obvious netgraph nodes have been
updated to ensure no outbound to inbound calls can occur. Nodes
such as ng_ipfw, ng_gif etc. should be further examined whether a
potential for outbound to inbound call looping exists.
This commit changes the layout of struct thread, but due to
__FreeBSD_version number shortage a version bump has been omitted
at this time, nevertheless kernel and modules have to be rebuilt.
Reviewed by: julian, rwatson, bz
Approved by: julian (mentor)
Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state. The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.
While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers. Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.
Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels. Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.
Bump __FreeBSD_version to 800097.
Reviewed by: bz, julian
Approved by: rwatson, kib (re), julian (mentor)
Some time ago Tom Rhodes sent me an email that he was willing to perform
various cleanups to the window(1) source code. After some discussion, we
both decided the best thing to do, was to move window(1) to the ports
tree. The application isn't used a lot nowadays, mainly because it has
been superseeded by screen, tmux, etc.
A couple of hours ago Tom committed window(1) to ports (misc/window), so
I'm removing it from the tree. I don't think people will really miss it,
but I'm describing the change in UPDATING anyway.
Discussed with: trhodes, pav, kib
Approved by: re
an accessor function to get the correct rnh pointer back.
Update netstat to get the correct pointer using kvm_read()
as well.
This not only fixes the ABI problem depending on the kernel
option but also permits the tunable to overwrite the kernel
option at boot time up to MAXFIBS, enlarging the number of
FIBs without having to recompile. So people could just use
GENERIC now.
Reviewed by: julian, rwatson, zec
X-MFC: not possible
Introduce for this operation the reverse NO_ADAPTIVE_SX option.
The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed
and the new flag, offering the reversed logic, SX_NOADAPTIVE is added.
Additively implements adaptive spininning for sx held in shared mode.
The spinning limit can be handled through sysctls in order to be tuned
while the code doesn't reach the release, after which time they should
be dropped probabilly.
This change has made been necessary by recent benchmarks where it does
improve concurrency of workloads in presence of high contention
(ie. ZFS).
KPI breakage is documented by __FreeBSD_version bumping, manpage and
UPDATING updates.
Requested by: jeff, kmacy
Reviewed by: jeff
Tested by: pho
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.
Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().
Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.
Approved by: bz (mentor)
to optionally have overlapping unit numbers if attached in different
vnets.
At this stage if_loop is the only clonable ifnet class that has been
extended to allow for such overlapping allocation of unit numbers, i.e.
in each vnet it is possible to have a lo0 interface. Other clonable ifnet
classes remain to operate with traditional semantics, i.e. each instance
of a clonable ifnet will be assigned a globally unique unit number,
regardless in which vnet such an ifnet becomes instantiated.
While here, garbage collect unused _lo_list field in struct vnet_net,
as well as improve indentation for #defines in sys/net/vnet.h.
The layout of struct vnet_net has changed, therefore bump
__FreeBSD_version.
This change has no functional impact on nooptions VIMAGE kernel builds.
Reviewed by: bz, brooks
Approved by: julian (mentor)
Upgrade of the tzcode from 2004a to 2009e.
Changes are numerous, but include...
- New format of the output of zic, which supports both 32 and 64
bit time_t formats.
- zdump on 64 bit platforms will actually produce some output instead
of doing nothing for a looooooooong time.
- linux_base-fX, with X >= at least 8, will work without problems related
to the local time again.
The original patch, based on the 2008e, has been running for a long
time on both my laptop and desktop machine and have been tested by
other people.
After the installation of this code and the running of zic(8), you
need to run tzsetup(8) again to install the new datafile.
Approved by: wollman@ for usr.sbin/zic
MFC after: 1 month
active network stack instance. Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:
1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables. As an example, V_ifnet becomes:
options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet
default build: vnet_net_0._ifnet
options VIMAGE_GLOBALS: ifnet
2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:
INIT_VNET_NET(ifp->if_vnet); becomes
struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];
3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals. If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.
4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet. options VIMAGE builds
will fill in those fields as required.
5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.
6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod. SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.
Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.
Reviewed by: bz, rwatson
Approved by: julian (mentor)
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
- add show as alias for get
- add weights to allow mpath to do more than equal cost
- add sticky / nostick to disable / re-enable per-connection load balancing
This adds a field to rt_metrics_lite so network bits of world will need to be re-built.
Reviewed by: jeli & qingli
driver in Linux 2.6. uscanner was just a simple wrapper around a fifo and
contained no logic, the default interface is now libusb (supported by sane).
Reviewed by: HPS
This is purely a forwarding plane cleanup; no control plane
code is involved.
Summary:
* Split IPv4 and IPv6 MROUTING support. The static compile-time
kernel option remains the same, however, the modules may now
be built for IPv4 and IPv6 separately as ip_mroute_mod and
ip6_mroute_mod.
* Clean up the IPv4 multicast forwarding code to use BSD queue
and hash table constructs. Don't build our own timer abstractions
when ratecheck() and timevalclear() etc will do.
* Expose the multicast forwarding cache (MFC) and virtual interface
table (VIF) as sysctls, to reduce netstat's dependence on libkvm
for this information for running kernels.
* bandwidth meters however still require libkvm.
* Make the MFC hash table size a boot/load-time tunable ULONG,
net.inet.ip.mfchashsize (defaults to 256).
* Remove unused members from struct vif and struct mfc.
* Kill RSVP support, as no current RSVP implementation uses it.
These stubs could be moved to raw_ip.c.
* Don't share locks or initialization between IPv4 and IPv6.
* Don't use a static struct route_in6 in ip6_mroute.c.
The v6 code is still using a cached struct route_in6, this is
moved to mif6 for the time being.
* More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c.
v4 path tested using ports/net/mcast-tools.
v6 changes are mostly mechanical locking and *have not* been tested.
As these changes partially break some kernel ABIs, they will not
be MFCed. There is a lot more work to be done here.
Reviewed by: Pavlin Radoslavov
IPv4 stack.
Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.
__FreeBSD_version is bumped to 800070.
ports tree so that programs use libusb from the base by default. Thanks to
Stanislav Sedov for sorting out the ports build.
Bump __FreeBSD_version to 800069
Help and testing by: stas
memory from int to size_t. Implement a workaround for current ABI not
allowing to properly save size for and report more then 2Gb sized segment
of shared memory.
This makes it possible to use > 2 Gb shared memory segments on 64bit
architectures. Please note the new BUGS section in shmctl(2) and
UPDATING note for limitations of this temporal solution.
Reviewed by: csjp
Tested by: Nikolay Dzham <i levsha org ua>
MFC after: 2 weeks
from the inet6 stack along with statistics and make sure we
properly free the rt in all cases.
While the current situation is not better performance wise it
prevents panics seen more often these days.
After more inet6 and ipsec cleanup we should be able to improve
the situation again passing the rt to ip6_forward directly.
Leave the ip6_forward_rt entry in struct vinet6 but mark it
for removal.
PR: kern/128247, kern/131038
MFC after: 25 days
Committed from: Bugathon #6
Tested by: Denis Ahrens <denis@h3q.com> (different initial version)
The new behaviour is on by default, and can be disabled by setting the
net.inet.tcp.rfc3465 sysctl to 0 to obtain previous behaviour.
The patch changes struct tcpcb in sys/netinet/tcp_var.h which breaks
the ABI. Bump __FreeBSD_version to 800061 accordingly. User space tools
that rely on the size of struct tcpcb (e.g. sockstat) need to be recompiled.
Reviewed by: rpaulo, gnn
Approved by: gnn, kmacy (mentors)
Sponsored by: FreeBSD Foundation
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
module; the ath module now brings in the hal support. Kernel
config files are almost backwards compatible; supplying
device ath_hal
gives you the same chip support that the binary hal did but you
must also include
options AH_SUPPORT_AR5416
to enable the extended format descriptors used by 11n parts.
It is now possible to control the chip support included in a
build by specifying exactly which chips are to be supported
in the config file; consult ath_hal(4) for information.
and ifnet functions
- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own
- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers
- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
allow drivers to efficiently manage multiple hardware queues
(i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues
This work was supported by Bitgravity Inc. and Chelsio Inc.
This should fix q_time overflow, which happens after 2^32/(86400*hz) days of
uptime (~50days for hz = 1000).
q_time overflow cause following:
- traffic shaping may not work in 'fast' mode (not enabled by default).
- incorrect average queue length calculation in RED/GRED algorithm.
NB: due to ABI change this change is not applicable to stable.
PR: kern/128401
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:
- Improved driver model:
The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.
If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.
- Improved hotplugging:
With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).
The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.
- Improved performance:
One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.
Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.
Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan
It turns out I forgot to mention that people really need to make sure
their hints are up to date if they are updating a system through the
serial console.
Requested by: gavin
Reviewed by: gavin
The uart(4) driver has the advantage of supporting a wider variety of
hardware on a greater amount of platforms. This driver has already been
the standard on platforms such as ia64, powerpc and sparc64.
I've decided not to change anything on pc98. I'd rather let people from
the pc98 team look at this.
Approved by: philip (mentor), marcel
parts relied on the now removed NET_NEEDS_GIANT.
Most of I4B has been disconnected from the build
since July 2007 in HEAD/RELENG_7.
This is what was removed:
- configuration in /etc/isdn
- examples
- man pages
- kernel configuration
- sys/i4b (drivers, layers, include files)
- user space tools
- i4b support from ppp
- further documentation
Discussed with: rwatson, re
to rc.conf(5) to remind people where to look for all the details.
People without network connectivity forget basics like this... This
is in keeping with historic UPDATING entries which try to provide
basic information in the entry, and a pointer to more extensive
information documenting the new thing.
commands can be written to /dev/psm%d and status can be read back from it.
- Reflect the change in psm(4) and bump version for ports.
MFC after: 1 week
structure. This allows per-CPU variations of struct pmap on a
single architecture without affecting the machine-independent
fields. As such, the PMAP variations don't affect the ABI. They
become part of it.
historical relic, and are no longer appropriate for either LAN or WAN
mounting. At modern (gigabit and 10 gigabit) LAN speeds packet loss
from socket buffer fill events is common, and sequence numbers wrap
quickly enough that data corruption is possible. TCP solves both of
these problems without imposing significant overhead.
MFC after: 1 month
fields in FTS and FTSENT structs being too narrow. In addition,
the narrow types creep from there into fts.c. As a result, fts(3)
consumers, e.g., find(1) or rm(1), can't handle file trees an ordinary
user can create, which can have security implications.
To fix the historic implementation of fts(3), OpenBSD and NetBSD
have already changed <fts.h> in somewhat incompatible ways, so we
are free to do so, too. This change is a superset of changes from
the other BSDs with a few more improvements. It doesn't touch
fts(3) functionality; it just extends integer types used by it to
match modern reality and the C standard.
Here are its points:
o For C object sizes, use size_t unless it's 100% certain that
the object will be really small. (Note that fts(3) can construct
pathnames _much_ longer than PATH_MAX for its consumers.)
o Avoid the short types because on modern platforms using them
results in larger and slower code. Change shorts to ints as
follows:
- For variables than count simple, limited things like states,
use plain vanilla `int' as it's the type of choice in C.
- For a limited number of bit flags use `unsigned' because signed
bit-wise operations are implementation-defined, i.e., unportable,
in C.
o For things that should be at least 64 bits wide, use long long
and not int64_t, as the latter is an optional type. See
FTSENT.fts_number aka FTS.fts_bignum. Extending fts_number `to
satisfy future needs' is pointless because there is fts_pointer,
which can be used to link to arbitrary data from an FTSENT.
However, there already are fts(3) consumers that require fts_number,
or fts_bignum, have at least 64 bits in it, so we must allow for them.
o For the tree depth, use `long'. This is a trade-off between making
this field too wide and allowing for 64-bit inode numbers and/or
chain-mounted filesystems. On the one hand, `long' is almost
enough for 32-bit filesystems on a 32-bit platform (our ino_t is
uint32_t now). On the other hand, platforms with a 64-bit (or
wider) `long' will be ready for 64-bit inode numbers, as well as
for several 32-bit filesystems mounted one under another. Note
that fts_level has to be signed because -1 is a magic value for it,
FTS_ROOTPARENTLEVEL.
o For the `nlinks' local var in fts_build(), use `long'. The logic
in fts_build() requires that `nlinks' be signed, but our nlink_t
currently is uint16_t. Therefore let's make the signed var wide
enough to be able to represent 2^16-1 in pure C99, and even 2^32-1
on a 64-bit platform. Perhaps the logic should be changed just
to use nlink_t, but it can be done later w/o breaking fts(3) ABI
any more because `nlinks' is just a local var.
This commit also inludes supporting stuff for the fts change:
o Preserve the old versions of fts(3) functions through libc symbol
versioning because the old versions appeared in all our former releases.
o Bump __FreeBSD_version just in case. There is a small chance that
some ill-written 3-rd party apps may fail to build or work correctly
if compiled after this change.
o Update the fts(3) manpage accordingly. In particular, remove
references to fts_bignum, which was a FreeBSD-specific hack to work
around the too narrow types of FTSENT members. Now fts_number is
at least 64 bits wide (long long) and fts_bignum is an undocumented
alias for fts_number kept around for compatibility reasons. According
to Google Code Search, the only big consumers of fts_bignum are in
our own source tree, so they can be fixed easily to use fts_number.
o Mention the change in src/UPDATING.
PR: bin/104458
Approved by: re (quite a while ago)
Discussed with: deischen (the symbol versioning part)
Reviewed by: -arch (mostly silence); das (generally OK, but we didn't
agree on some types used; assuming that no objections on
-arch let me to stick to my opinion)
and newer were supported upgrade paths to -current. After today's
commits, 6.0-RELEASE and newer is supported for jumping to current.
Make that clear in the UPDATING entry. For the pedants out there,
upgrading from FreeBSD_version 600029 and newer should still work.
This represents a point from May 29, 2005 forward. The prior date was
October 16th 2004.
This has the following benefits:
- allows to use the AT keyboard maps in share/syscons/keymaps with
sunkbd(4),
- allows to use kbdmux(4) with sunkbd(4),
- allows Sun RS232 keyboards to be configured and used the same
way as Sun USB keyboards driven by ukbd(4) (which also does AT
keyboard emulation) with X.Org, putting an end to the problem
of native support for the former in X.Org being broken over and
over again.
MFC after: 3 days
the PCIOCGETCONF, PCIOCREAD and PCIOCWRITE IOCTLs, which was broken
with the introduction of PCI domain support.
As the size of struct pci_conf_io wasn't changed with that commit,
this unfortunately requires the ABI of PCIOCGETCONF to be broken
again in order to be able to provide backwards compatibility to
the old version of that IOCTL.
Requested by: imp
Discussed with: re (kensmith)
Reviewed by: PCI maintainers (imp, jhb)
MFC after: 5 days
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.
Suggested by: jhb
Reviewed by: grehan, jhb, marcel
Approved by: re (kensmith), jhb (PCI maintainer hat)
This commit includes only the kernel files, the rest of the files
will follow in a second commit.
Reviewed by: bz
Approved by: re
Supported by: Secure Computing
/etc/rc.d/sendmail whether or not to run newaliases if the database
is missing or the aliases text file is newer than aliases.db.
In my opinion, the aliases file should never be automatically rebuilt.
The current text form could represent a work in progress. Therefore,
in FreeBSD 7.0, this new option will default to "NO". When this rc.d
change is MFC'ed, it will need to remain "YES" to maintain backward
compatibility.
PR: conf/86252
Approved by: re (kensmith)
MFC after: 3 days
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.
This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.
The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html
Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.
This work was financially supported by another FreeBSD committer.
Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)