If compiled in for dual-stack use, test with feature_present(3)
to see if we should register the IPv4/IPv6 address family related
options.
In case there is no "inet" support we would love to go with the
usage() and make the address family mandatory (as it is for anything
but inet in theory). Unfortunately people are used to
ifconfig IF up/down
etc. as well, so use a fallback of "link". Adjust the man page
to reflect these minor details.
Improve error handling printing a warning in addition to the usage
telling that we do not know the given address family in two places.
Reviewed by: hrs, rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 2 weeks
a sync(2) syscall before unmount(2) for the "-f" case.
This avoids a forced dismount from getting stuck for
an NFS mountpoint in sync() when the server is not
responsive. With this commit, forced dismounts should
normally work for the NFS clients, but can take up to
about 1minute to complete.
PR: kern/157365
Reviewed by: kib
MFC after: 2 weeks
16K to 32K and the default fragment size from 2K to 4K.
The rational is that most disks are now running with 4K
sectors. While they can (slowly) simulate 512-byte sectors
by doing a read-modify-write, it is desirable to avoid this
functionality. By raising the minimum filesystem allocation
to 4K, the filesystem will never trigger the small sector
emulation.
Also, the growth of disk sizes has lead us to double the
default block size about every ten years. The rise from 8K
to 16K blocks was done in 2001. So, by the 10-year metric,
the time has come for 32K blocks.
Discussed at: May 2011 BSDCan Developer Summit
Reference: http://wiki.freebsd.org/201105DevSummit/FileSystems
requests as well as number of activemap updates.
Number of BIO_WRITEs and activemap updates are especially interesting, because
if those two are too close to each other, it means that your workload needs
bigger number of dirty extents. Activemap should be updated as rarely as
possible.
MFC after: 1 week
with geometry. And they do recalculation of user specified parameters.
MBR, PC98, VTOC8, EBR schemes are doing that. For these schemes an
auto alignment feature (ie. gpart add -a alignment) would not work.
But it can work for GPT and BSD schemes. BSD scheme usualy is created
inside MBR, so we can use knowledge about offset of MBR partition to
calculate aligned values for BSD partitions.
Use "offset" attribute of the parent provider for better alignment.
MFC after: 2 weeks
clear the options of the current media, i.e. only inherit the instance,
which matches what NetBSD does. Without this it's really non-intuitive
that the following sequence:
ifconfig bge0 media 1000baseT mediaopt full-duplex
ifconfig bge0 media 100baseTX
results in 100baseTX full-duplex to be set or that:
ifconfig bge0 media autoselect mediaopt flowcontrol
ifconfig bge0 media 1000baseT mediaopt full-duplex
tries to set 1000baseT full-duplex with flowcontrol, which isn't suported
und thus fails while the following:
ifconfig re0 media 1000baseT mediaopt flowcontrol,full-duplex
ifconfig re0 media autoselect
just switches to autoselection without flowcontrol.
MFC after: 2 weeks
because we need to do ioctl(2)s, which are not permitted in the capability
mode. What we do now is to chroot(2) to /var/empty, which restricts access
to file system name space and we drop privileges to hast user and hast
group.
This still allows to access to other name spaces, like list of processes,
network and sysvipc.
To address that, use jail(2) instead of chroot(2). Using jail(2) will restrict
access to process table, network (we use ip-less jails) and sysvipc (if
security.jail.sysvipc_allowed is turned off). This provides much better
separation.
MFC after: 1 week
operate on one type of filesystem, mention this.
While here, capitalise the use of "UFS" in growfs.8 to match other uses of
the term in other man pages.
MFC after: 1 week
checking at open time. It may improve performance for read-only
NFS mounts. Use deliberately.
MFC after: 1 week
Reviewed by: rmacklem, jhb (earlier version)
This improves usability a little as we no longer require using touch.
Also reword the manpage wrt. parameters and fix usage() [1]
With no media in a cd(4) drive, the reads will loop producing EINVAL,
abort in that case [2].
Document the shortcoming of sectorsize and MAXPHYS (a quick solution
to this might be having MAXPHYS as the "bigsize", in short testing it
didn't make a difference on throughput).
Submitted by: arundel [1]
PR: bin/154528 [2]
that was built before ffs grew support for TRIM, your filesystem will have
plenty of free blocks that the flash chip doesn't know are free, so it
can't take advantage of them for wear leveling. Once you've upgraded your
kernel, you enable TRIM on the filesystem (tunefs -t enable), then run
fsck_ffs -E on it before mounting it.
I tested this patch by half-filling an mdconfig'ed filesystem image,
running fsck_ffs -E on it, then verifying that the contents were not
damaged by comparing them to a pristine copy using rsync's checksum
functionality. There is no reliable way to test it on real hardware.
Many thanks to mckusick@, who provided the tricky parts of this patch and
reviewed the final version.
Reviewed by: mckusick@
MFC after: 3 weeks
"mdconfig -f file", I decided that it would be easier to make mdconfig
DWIM than to teach my fingers to type the correct command line.
MFC after: 3 weeks
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.
hastd process and workers, remove unused one and set different range
of numbers. This is done in order not to confuse them with HASTCTL_CMD
defines, used for conversation between hastctl and hastd, and to avoid
bugs like the one fixed in in r221075.
Approved by: pjd (mentor)
MFC after: 1 week
This avoids a potentially many-hours-long loop of failed writes if newfs
finds a partially-overwritten superblock (or, for that matter, random
garbage which happens to have superblock magic bytes); on one occasion I
found newfs trying to zero 800 million superblocks on a 50 MB disk.
Reviewed by: mckusick
MFC after: 1 week
secondary role. It is possible that the remote node is primary, but only
because there was a role change and it didn't finish cleaning up (unmounting
file systems, etc.). If we detect such situation, wait for the remote node
to switch the role to secondary before accepting I/Os. If we don't wait for
it in that case, we will most likely cause split-brain.
MFC after: 1 week
- We have two nodes connected and synchronized (local counters on both sides
are 0).
- We take secondary down and recreate it.
- Primary connects to it and starts synchronization (but local counters are
still 0).
- We switch the roles.
- Synchronization restarts but data is synchronized now from new primary
(because local counters are 0) that doesn't have new data yet.
This fix this issue we bump local counter on primary when we discover that
connected secondary was recreated and has no data yet.
Reported by: trociny
Discussed with: trociny
Tested by: trociny
MFC after: 1 week
background.
Suggested by: Garrett Cooper <yanegomi@gmail.com>
Use EAGAIN instead of magic value of -2 to report this condition from the
SetAliasAddressFromIfName routine.
MFC after: 2 weeks
hast_proto_recv_hdr() may be used. This also fixes the issue
(introduced by r220523) with hastctl, which crashed on assert in
hast_proto_recv_data().
Suggested and approved by: pjd (mentor)
was incorrect - 'list scan' does not actually do a scan, but instead lists
the results of the background 'scan' cache.
Submitted by: Fabian Keil (freebsd-listen of fabiankeil de) (via email)
Discussed with: bschmidt
MFC after: 3 days
{readline,history}.h are in /usr/include/edit so as to not conflict with
the GNU libreadline versions. To use the libedit readline(3) one should
add "-I/usr/include/edit" to their Makefile
(spelled "-I${DESTDIR}/${INCLUDEDIR}/edit" within the FreeBSD source tree).
* Enable its use in the BSD licensed utilities that support readline(3).
* To make it easier to sync libedit development with NetBSD, histedit.h
is moved into libedit's directory as history shows shown we keep merging
it into that location.
Obtained from: NetBSD
Sponsored by: Juniper Networks
this means that request timed out. Translate the meaningless EAGAIN to
ETIMEDOUT to give administrator a hint that he might need to increase timeout
in configuration file.
MFC after: 1 month
GEOM GATE to fix the issue described in r220264. This also means that we no
longer need -q option, remove it. Don't bother to leaving it as a no-op, as
ggatel(8) is just an example utility.
Before it could change later and we were sending invalid mapsize.
Some time ago I added optimization where when nodes are connected for the
first time and there were no writes to them yet, there is no initial full
synchronization. This bug prevented it from working.
MFC after: 1 week
Add new RAID GEOM class, that is going to replace ataraid(4) in supporting
various BIOS-based software RAIDs. Unlike ataraid(4) this implementation
does not depend on legacy ata(4) subsystem and can be used with any disk
drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4)
with `options ATA_CAM`). To make code more readable and extensible, this
implementation follows modular design, including core part and two sets
of modules, implementing support for different metadata formats and RAID
levels.
Support for such popular metadata formats is now implemented:
Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage.
Such RAID levels are now supported:
RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT.
For any all of these RAID levels and metadata formats this class supports
full cycle of volume operations: reading, writing, creation, deletion,
disk removal and insertion, rebuilding, dirty shutdown detection
and resynchronization, bad sector recovery, faulty disks tracking,
hot-spare disks. For Intel and Promise formats there is support multiple
volumes per disk set.
Look graid(8) manual page for additional details.
Co-authored by: imp
Sponsored by: Cisco Systems, Inc. and iXsystems, Inc.
Make `geom XXX list` and `geom XXX status` outputs more consistent:
Add -a options to print all geoms, not only ones with providers.
Add -g option for `status` to report geom's names, not provider's.
Make `status` by default report provider's status (if present), not geom's.
Make `status` report consumer's statuses, not only "synchronized" field.
equal to secondary counters:
primary_localcnt = secondary_remotecnt
primary_remotecnt = secondary_localcnt
Previously it was done wrong and split-brain was observed after
primary had synchronized up-to-date data from secondary.
Approved by: pjd (mentor)
MFC after: 1 week
We can use capsicum for secondary worker processes and hastctl.
When working as primary we drop privileges using chroot+setgid+setuid
still as we need to send ioctl(2)s to ggate device, for which capsicum
doesn't allow (yet).
X-MFC after: capsicum is merged to stable/8
our info about worker processes if any of them was terminated in the meantime.
This fixes the problem with 'hastctl status' running from a hook called on
split-brain:
1. Secondary calls a hooks and terminates.
2. Hook asks for resource status via 'hastctl status'.
3. The main hastd handles the status request by sending it to the secondary
worker who is already dead, but because signals weren't checked yet he
doesn't know that and we get EPIPE.
MFC after: 1 week
This way we know how to connect to secondary node when we are primary.
The same variable is used by the secondary node - it only accepts
connections from the address stored in 'remote' variable.
In cluster configurations it is common that each node has its individual
IP address and there is one addtional shared IP address which is assigned
to primary node. It seems it is possible that if the shared IP address is
from the same network as the individual IP address it might be choosen by
the kernel as a source address for connection with the secondary node.
Such connection will be rejected by secondary, as it doesn't come from
primary node individual IP.
Add 'source' variable that allows to specify source IP address we want to
bind to before connecting to the secondary node.
MFC after: 1 week
connection so the worker will exit if it does not receive packets from
the primary during this interval.
Reported by: Christian Vogt <Christian.Vogt@haw-hamburg.de>
Tested by: Christian Vogt <Christian.Vogt@haw-hamburg.de>
Approved by: pjd (mentor)
MFC after: 1 week
This makes partitions between 50GiB and 2TiB (16TiB for 4k drives) print
correctly aligned.
While here, fix type of secsize. g_sectorsize() returns ssize_t, don't
store this in an unsigned var. Bump WARNS to 6.
MFC after: 4 weeks
- the default label now includes an a: partition by default
- the c: partition is no longer exported via devfs
- writing of the labels usually works in all cases, though the script
assumes half of them have to fail
partitions instead of partition's indexes. This may be useful with
GPT partitioning scheme or EBR without GEOM_PART_EBR_COMPAT option.
MFC after: 2 weeks
- Load support for %T for pritning time.
- Add support for %N for printing number in human readable form.
- Add support for %S for printing sockaddr structure (currently only AF_INET
family is supported, as this is all we need in HAST).
- Disable gcc compile-time format checking as this will no longer work.
MFC after: 2 weeks
- HOLE - it simply turns all-zero blocks into few bytes header;
it is extremely fast, so it is turned on by default;
it is mostly intended to speed up initial synchronization
where we expect many zeros;
- LZF - very fast algorithm by Marc Alexander Lehmann, which shows
very decent compression ratio and has BSD license.
MFC after: 2 weeks