to be assigned to a non-default FIB instance.
You may need to recompile world or ports due to the change of struct ifnet.
Submitted by: cjsp
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
(original versions)
Reviewed by: julian
Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru)
MFC after: 2 weeks
X-MFC: use spare in struct ifnet
possible to organize subroutines with rules.
The "call" action saves the current rule number in the internal
stack and rules processing continues from the first rule with
specified number (similar to skipto action). If later a rule with
"return" action is encountered, the processing returns to the first
rule with number of "call" rule saved in the stack plus one or higher.
Submitted by: Vadim Goncharov
Discussed by: ipfw@, luigi@
disk if needed. This should fix a potential case when extents are cleared in
activemap but metadata is not updated on disk.
Suggested by: pjd
Approved by: pjd (mentor)
stating if we need to update activemap on disk. This makes keepdirty
serve its purpose -- to reduce number of metadata updates.
Discussed with: pjd
Approved by: pjd (mentor)
gpart_write_partcode_vtoc8 does access out of range of allocated memory.
Check size of bootcode before writing it.
Pointed out by: ru
MFC after: 1 week
applicable here, since modifies the string. Switch to strchr().
- Restore support for undocumented optional parameters of
redir_port and redir_proto, that were disabled in 220835.
- While here, change !isalpha() checks on optinal parameters
for isdigit().
Submitted by: Alexander V. Chernikov <melifaro ipfw.ru>
PR: kern/143653
only receiving the data. In r220271 the unused directions were
disabled using shutdown(2).
Unfortunately, this broke automatic receive buffer sizing, which
currently works only for connections in ETASBLISHED state. It was a
root cause of the issue reported by users, when connection between
primary and secondary could get stuck.
Disable the code introduced in r220271 until the issue with automatic
buffer sizing is not resolved.
Reported by: Daniel Kalchev <daniel@digsys.bg>, danger, sobomax
Tested by: Daniel Kalchev <daniel@digsys.bg>, danger
Approved by: pjd (mentor)
MFC after: 1 week
other device attributes stored in the CAM Existing Device Table (EDT).
This includes some infrastructure requried by the enclosure services
driver to export physical path information.
Make the CAM device advanced info interface accept store requests.
sys/cam/scsi/scsi_all.c:
sys/cam/scsi/scsi_all.h:
- Replace scsi_get_sas_addr() with a scsi_get_devid() which takes
a callback that decides whether to accept a particular descriptor.
Provide callbacks for NAA IEEE Registered addresses and for SAS
addresses, replacing the old function. This is needed because
the old function doesn't work for an enclosure address for a SAS
device, which is not flagged as a SAS address, but is NAA IEEE
Registered. It may be worthwhile merging this interface with the
devid match interface.
- Add a few more defines for some device ID fields.
sbin/camcontrol/camcontrol.c:
- Update for the CCB_DEV_ADVINFO interface change.
cam/cam_xpt_internal.h:
- Add the new fields for the physical path string to the CAM EDT.
cam/cam_ccb.h:
- Rename CCB_GDEV_ADVINFO to simply CCB_DEV_ADVINFO, and the ccb
structure to ccb_dev_advinfo.
- Add a flag that changes this CCB's action to store, rather than
the default, retrieve.
- Add a new buffer type, CDAI_TYPE_PHYS_PATH, for the new CAM EDT
physpath field.
- Remove the never-implemented transport & proto flags.
cam/cam_xpt.c:
cam/cam_xpt.h:
- Add xpt_getattr(), which provides a wrapper for fetching a device's
attribute using the GEOM strings as key. This method currently
supports "GEOM::ident" and "GEOM::physpath".
Submitted by: will
Reviewed by : gibbs
Extend the XPT_DEV_MATCH api to allow a device search by device ID.
As far as the API is concerned, device ID is a binary blob to be
interpreted by the transport layer. The SCSI implementation assumes
it is an array of VPD device ID descriptors.
sys/cam/cam_ccb.h:
Create a new structure, device_id_match_pattern, and
update the XPT_DEV_MATCH datastructures and flags so
that this pattern type can be used.
sys/cam/cam_xpt.c:
- A single pattern matching on both inquiry data and device
ID is invalid. Report any violators.
- Pass device ID match requests through to the new routine
scsi_devid_match(). The direct call of a SCSI routine is
a layering violation, but no worse than the one a few
lines up that checks inquiry data. Defer cleaning this
up until our future, larger, rototilling of CAM.
- Zero out cam_ed and cam_et nodes on allocation. Prior to
this change, device_id_len and device_id were not inialized,
preventing proper detection of the presence of this
information.
sys/cam/scsi/scsi_all.c:
sys/cam/scsi/scsi_all.h:
Add the scsi_match_devid() routine.
Add a helper function for extracting peripherial driver names
sys/cam/cam_periph.c:
sys/cam/cam_periph.h:
Add the cam_periph_list() method which fills an sbuf
with a comma delimited list of the peripheral instances
associated with a given CAM path.
Add a helper functions for SCSI commands used by the SES driver.
sys/cam/scsi/scsi_all.c:
sys/cam/scsi/scsi_all.h:
Add structure definitions and csio filling functions for
the receive diagnostic results and send diagnostic commands.
Misc CAM XPT cleanups.
sys/cam/cam_xpt.c:
Broadcast AC_FOUND_DEVICE and AC_PATH_REGISTERED
events at the time async event handlers are attached
even when registering just for events on a partitular
SIM. Previously, you had to register for these
events on all SIMs in the system in order to get
the initial broadcast even though subsequent device
and path arrivals would be delivered.
sys/cam/cam_xpt.c:
Remove SIM mutex held asserts from path accessors.
CAM paths are reference counted and it is this
reference count, not the sim mutex, that garantees
they are stable.
Sponsored by: Spectra Logic Corporation
"globalport" option for multiple NAT instances.
If ipfw rule contains "global" keyword instead of nat_number, then
for each outgoing packet ipfw_nat looks up translation state in all
configured nat instances. If an entry is found, packet aliased
according to that entry, otherwise packet is passed unchanged.
User can specify "skip_global" option in NAT configuration to exclude
an instance from the lookup in global mode.
PR: kern/157867
Submitted by: Alexander V. Chernikov (previous version)
Tested by: Eugene Grosbein
Document the fact that we might want an IFCAP_CANTCHANGE mask,
even though the value is not yet used in sys/net/if.c
(asked on -current a week ago, no feedback so i assume no objection).
to resolve errors which can cause corruption on recovery with the old
synchronous mechanism.
- Append partial truncation freework structures to indirdeps while
truncation is proceeding. These prevent new block pointers from
becoming valid until truncation completes and serialize truncations.
- On completion of a partial truncate journal work waits for zeroed
pointers to hit indirects.
- softdep_journal_freeblocks() handles last frag allocation and last
block zeroing.
- vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it
is only implemented in one place.
- Block allocation failure handling moved up one level so it does not
proceed with buf locks held. This permits us to do more extensive
reclaims when filesystem space is exhausted.
- softdep_sync_metadata() is broken into two parts, the first executes
once at the start of ffs_syncvnode() and flushes truncations and
inode dependencies. The second is called on each locked buf. This
eliminates excessive looping and rollbacks.
- Improve the mechanism in process_worklist_item() that handles
acquiring vnode locks for handle_workitem_remove() so that it works
more generally and does not loop excessively over the same worklist
items on each call.
- Don't corrupt directories by zeroing the tail in fsck. This is only
done for regular files.
- Push a fsync complete record for files that need it so the checker
knows a truncation in the journal is no longer valid.
Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts)
Tested by: pho
the system to proceed to boot without bailing out into single user mode,
even when the file system can not be successfully mounted.
This option is implemented in mount(8) and not passed into kernel.
MFC after: 1 month
When user wants have specific alignment - do what user wants.
Use stripesize as alignment value in case, when some of gpart's
arguments are ommitted for automatic calculation.
Suggested by: mav
chnage is different to the one suggested in the PR to try to avoid
cluttering the man page too much.
PR: docs/154494
Submitted by: kilian <kilian.klimek googlemail.com>
MFC after: 1 week
- A new per-interface knob IFF_ND6_NO_RADR and sysctl IPV6CTL_NO_RADR.
This controls if accepting a route in an RA message as the default route.
The default value for each interface can be set by net.inet6.ip6.no_radr.
The system wide default value is 0.
- A new sysctl: net.inet6.ip6.norbit_raif. This controls if setting R-bit in
NA on RA accepting interfaces. The default is 0 (R-bit is set based on
net.inet6.ip6.forwarding).
Background:
IPv6 host/router model suggests a router sends an RA and a host accepts it for
router discovery. Because of that, KAME implementation does not allow
accepting RAs when net.inet6.ip6.forwarding=1. Accepting RAs on a router can
make the routing table confused since it can change the default router
unintentionally.
However, in practice there are cases where we cannot distinguish a host from
a router clearly. For example, a customer edge router often works as a host
against the ISP, and as a router against the LAN at the same time. Another
example is a complex network configurations like an L2TP tunnel for IPv6
connection to Internet over an Ethernet link with another native IPv6 subnet.
In this case, the physical interface for the native IPv6 subnet works as a
host, and the pseudo-interface for L2TP works as the default IP forwarding
route.
Problem:
Disabling processing RA messages when net.inet6.ip6.forwarding=1 and
accepting them when net.inet6.ip6.forward=0 cause the following practical
issues:
- A router cannot perform SLAAC. It becomes a problem if a box has
multiple interfaces and you want to use SLAAC on some of them, for
example. A customer edge router for IPv6 Internet access service
using an IPv6-over-IPv6 tunnel sometimes needs SLAAC on the
physical interface for administration purpose; updating firmware
and so on (link-local addresses can be used there, but GUAs by
SLAAC are often used for scalability).
- When a host has multiple IPv6 interfaces and it receives multiple RAs on
them, controlling the default route is difficult. Router preferences
defined in RFC 4191 works only when the routers on the links are
under your control.
Details of Implementation Changes:
Router Advertisement messages will be accepted even when
net.inet6.ip6.forwarding=1. More precisely, the conditions are as
follow:
(ACCEPT_RTADV && !NO_RADR && !ip6.forwarding)
=> Normal RA processing on that interface. (as IPv6 host)
(ACCEPT_RTADV && (NO_RADR || ip6.forwarding))
=> Accept RA but add the router to the defroute list with
rtlifetime=0 unconditionally. This effectively prevents
from setting the received router address as the box's
default route.
(!ACCEPT_RTADV)
=> No RA processing on that interface.
ACCEPT_RTADV and NO_RADR are per-interface knob. In short, all interface
are classified as "RA-accepting" or not. An RA-accepting interface always
processes RA messages regardless of ip6.forwarding. The difference caused by
NO_RADR or ip6.forwarding is whether the RA source address is considered as
the default router or not.
R-bit in NA on the RA accepting interfaces is set based on
net.inet6.ip6.forwarding. While RFC 6204 W-1 rule (for CPE case) suggests
a router should disable the R-bit completely even when the box has
net.inet6.ip6.forwarding=1, I believe there is no technical reason with
doing so. This behavior can be set by a new sysctl net.inet6.ip6.norbit_raif
(the default is 0).
Usage:
# ifconfig fxp0 inet6 accept_rtadv
=> accept RA on fxp0
# ifconfig fxp0 inet6 accept_rtadv no_radr
=> accept RA on fxp0 but ignore default route information in it.
# sysctl net.inet6.ip6.norbit_no_radr=1
=> R-bit in NAs on RA accepting interfaces will always be set to 0.
sending. What happens otherwise is that the sender splits all the
traffic into 32k chunks, while the receiver is waiting for the whole
packet. Then for a certain packet sizes, particularly 66607 bytes in
my case, the communication stucks to secondary is expecting to
read one chunk of 66607 bytes, while primary is sending two chunks
of 32768 bytes and third chunk of 1071. Probably due to TCP windowing
and buffering the final chunk gets stuck somewhere, so neither server
not client can make any progress.
This patch also protect from short reads, as according to the manual
page there are some cases when MSG_WAITALL can give less data than
expected.
MFC after: 3 days
partition offsets. If user requests specific alignment and
provider's stripesize is not zero, then use a least common multiple
from the stripesize and user specified value.
Also fix "gpart resize" implementation: do not try to align the partition
size, because the start offset may be not aligned. Instead align the
end offset and then calculate size. Also use stripesize and stripeoffset
for "gpart resize" command.
If compiled in for dual-stack use, test with feature_present(3)
to see if we should register the IPv4/IPv6 address family related
options.
In case there is no "inet" support we would love to go with the
usage() and make the address family mandatory (as it is for anything
but inet in theory). Unfortunately people are used to
ifconfig IF up/down
etc. as well, so use a fallback of "link". Adjust the man page
to reflect these minor details.
Improve error handling printing a warning in addition to the usage
telling that we do not know the given address family in two places.
Reviewed by: hrs, rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 2 weeks
a sync(2) syscall before unmount(2) for the "-f" case.
This avoids a forced dismount from getting stuck for
an NFS mountpoint in sync() when the server is not
responsive. With this commit, forced dismounts should
normally work for the NFS clients, but can take up to
about 1minute to complete.
PR: kern/157365
Reviewed by: kib
MFC after: 2 weeks
16K to 32K and the default fragment size from 2K to 4K.
The rational is that most disks are now running with 4K
sectors. While they can (slowly) simulate 512-byte sectors
by doing a read-modify-write, it is desirable to avoid this
functionality. By raising the minimum filesystem allocation
to 4K, the filesystem will never trigger the small sector
emulation.
Also, the growth of disk sizes has lead us to double the
default block size about every ten years. The rise from 8K
to 16K blocks was done in 2001. So, by the 10-year metric,
the time has come for 32K blocks.
Discussed at: May 2011 BSDCan Developer Summit
Reference: http://wiki.freebsd.org/201105DevSummit/FileSystems
requests as well as number of activemap updates.
Number of BIO_WRITEs and activemap updates are especially interesting, because
if those two are too close to each other, it means that your workload needs
bigger number of dirty extents. Activemap should be updated as rarely as
possible.
MFC after: 1 week
with geometry. And they do recalculation of user specified parameters.
MBR, PC98, VTOC8, EBR schemes are doing that. For these schemes an
auto alignment feature (ie. gpart add -a alignment) would not work.
But it can work for GPT and BSD schemes. BSD scheme usualy is created
inside MBR, so we can use knowledge about offset of MBR partition to
calculate aligned values for BSD partitions.
Use "offset" attribute of the parent provider for better alignment.
MFC after: 2 weeks
clear the options of the current media, i.e. only inherit the instance,
which matches what NetBSD does. Without this it's really non-intuitive
that the following sequence:
ifconfig bge0 media 1000baseT mediaopt full-duplex
ifconfig bge0 media 100baseTX
results in 100baseTX full-duplex to be set or that:
ifconfig bge0 media autoselect mediaopt flowcontrol
ifconfig bge0 media 1000baseT mediaopt full-duplex
tries to set 1000baseT full-duplex with flowcontrol, which isn't suported
und thus fails while the following:
ifconfig re0 media 1000baseT mediaopt flowcontrol,full-duplex
ifconfig re0 media autoselect
just switches to autoselection without flowcontrol.
MFC after: 2 weeks
because we need to do ioctl(2)s, which are not permitted in the capability
mode. What we do now is to chroot(2) to /var/empty, which restricts access
to file system name space and we drop privileges to hast user and hast
group.
This still allows to access to other name spaces, like list of processes,
network and sysvipc.
To address that, use jail(2) instead of chroot(2). Using jail(2) will restrict
access to process table, network (we use ip-less jails) and sysvipc (if
security.jail.sysvipc_allowed is turned off). This provides much better
separation.
MFC after: 1 week
operate on one type of filesystem, mention this.
While here, capitalise the use of "UFS" in growfs.8 to match other uses of
the term in other man pages.
MFC after: 1 week
checking at open time. It may improve performance for read-only
NFS mounts. Use deliberately.
MFC after: 1 week
Reviewed by: rmacklem, jhb (earlier version)
This improves usability a little as we no longer require using touch.
Also reword the manpage wrt. parameters and fix usage() [1]
With no media in a cd(4) drive, the reads will loop producing EINVAL,
abort in that case [2].
Document the shortcoming of sectorsize and MAXPHYS (a quick solution
to this might be having MAXPHYS as the "bigsize", in short testing it
didn't make a difference on throughput).
Submitted by: arundel [1]
PR: bin/154528 [2]
that was built before ffs grew support for TRIM, your filesystem will have
plenty of free blocks that the flash chip doesn't know are free, so it
can't take advantage of them for wear leveling. Once you've upgraded your
kernel, you enable TRIM on the filesystem (tunefs -t enable), then run
fsck_ffs -E on it before mounting it.
I tested this patch by half-filling an mdconfig'ed filesystem image,
running fsck_ffs -E on it, then verifying that the contents were not
damaged by comparing them to a pristine copy using rsync's checksum
functionality. There is no reliable way to test it on real hardware.
Many thanks to mckusick@, who provided the tricky parts of this patch and
reviewed the final version.
Reviewed by: mckusick@
MFC after: 3 weeks
"mdconfig -f file", I decided that it would be easier to make mdconfig
DWIM than to teach my fingers to type the correct command line.
MFC after: 3 weeks
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.
hastd process and workers, remove unused one and set different range
of numbers. This is done in order not to confuse them with HASTCTL_CMD
defines, used for conversation between hastctl and hastd, and to avoid
bugs like the one fixed in in r221075.
Approved by: pjd (mentor)
MFC after: 1 week
This avoids a potentially many-hours-long loop of failed writes if newfs
finds a partially-overwritten superblock (or, for that matter, random
garbage which happens to have superblock magic bytes); on one occasion I
found newfs trying to zero 800 million superblocks on a 50 MB disk.
Reviewed by: mckusick
MFC after: 1 week
secondary role. It is possible that the remote node is primary, but only
because there was a role change and it didn't finish cleaning up (unmounting
file systems, etc.). If we detect such situation, wait for the remote node
to switch the role to secondary before accepting I/Os. If we don't wait for
it in that case, we will most likely cause split-brain.
MFC after: 1 week
- We have two nodes connected and synchronized (local counters on both sides
are 0).
- We take secondary down and recreate it.
- Primary connects to it and starts synchronization (but local counters are
still 0).
- We switch the roles.
- Synchronization restarts but data is synchronized now from new primary
(because local counters are 0) that doesn't have new data yet.
This fix this issue we bump local counter on primary when we discover that
connected secondary was recreated and has no data yet.
Reported by: trociny
Discussed with: trociny
Tested by: trociny
MFC after: 1 week
background.
Suggested by: Garrett Cooper <yanegomi@gmail.com>
Use EAGAIN instead of magic value of -2 to report this condition from the
SetAliasAddressFromIfName routine.
MFC after: 2 weeks
hast_proto_recv_hdr() may be used. This also fixes the issue
(introduced by r220523) with hastctl, which crashed on assert in
hast_proto_recv_data().
Suggested and approved by: pjd (mentor)
was incorrect - 'list scan' does not actually do a scan, but instead lists
the results of the background 'scan' cache.
Submitted by: Fabian Keil (freebsd-listen of fabiankeil de) (via email)
Discussed with: bschmidt
MFC after: 3 days
{readline,history}.h are in /usr/include/edit so as to not conflict with
the GNU libreadline versions. To use the libedit readline(3) one should
add "-I/usr/include/edit" to their Makefile
(spelled "-I${DESTDIR}/${INCLUDEDIR}/edit" within the FreeBSD source tree).
* Enable its use in the BSD licensed utilities that support readline(3).
* To make it easier to sync libedit development with NetBSD, histedit.h
is moved into libedit's directory as history shows shown we keep merging
it into that location.
Obtained from: NetBSD
Sponsored by: Juniper Networks
this means that request timed out. Translate the meaningless EAGAIN to
ETIMEDOUT to give administrator a hint that he might need to increase timeout
in configuration file.
MFC after: 1 month
GEOM GATE to fix the issue described in r220264. This also means that we no
longer need -q option, remove it. Don't bother to leaving it as a no-op, as
ggatel(8) is just an example utility.
Before it could change later and we were sending invalid mapsize.
Some time ago I added optimization where when nodes are connected for the
first time and there were no writes to them yet, there is no initial full
synchronization. This bug prevented it from working.
MFC after: 1 week
Add new RAID GEOM class, that is going to replace ataraid(4) in supporting
various BIOS-based software RAIDs. Unlike ataraid(4) this implementation
does not depend on legacy ata(4) subsystem and can be used with any disk
drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4)
with `options ATA_CAM`). To make code more readable and extensible, this
implementation follows modular design, including core part and two sets
of modules, implementing support for different metadata formats and RAID
levels.
Support for such popular metadata formats is now implemented:
Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage.
Such RAID levels are now supported:
RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT.
For any all of these RAID levels and metadata formats this class supports
full cycle of volume operations: reading, writing, creation, deletion,
disk removal and insertion, rebuilding, dirty shutdown detection
and resynchronization, bad sector recovery, faulty disks tracking,
hot-spare disks. For Intel and Promise formats there is support multiple
volumes per disk set.
Look graid(8) manual page for additional details.
Co-authored by: imp
Sponsored by: Cisco Systems, Inc. and iXsystems, Inc.
Make `geom XXX list` and `geom XXX status` outputs more consistent:
Add -a options to print all geoms, not only ones with providers.
Add -g option for `status` to report geom's names, not provider's.
Make `status` by default report provider's status (if present), not geom's.
Make `status` report consumer's statuses, not only "synchronized" field.