DRC for NFS over TCP.
- Increase the size of the hash tables.
- Create a separate mutex for each hash list of the TCP hash table.
- Single thread the code that deletes stale cache entries.
- Add a tunable called vfs.nfsd.tcphighwater, which can be increased
to allow the cache to grow larger, avoiding the overhead of frequent
scans to delete stale cache entries.
(The default value will result in frequent scans to delete stale cache
entries, analagous to what the pre-patched code does.)
- Add a tunable called vfs.nfsd.cachetcp that can be used to disable
DRC caching for NFS over TCP, since the old NFS server didn't DRC cache TCP.
It also adjusts the size of nfsrc_floodlevel dynamically, so that it is
always greater than vfs.nfsd.tcphighwater.
For UDP the algorithm remains the same as the pre-patched code, but the
tunable vfs.nfsd.udphighwater can be used to allow the cache to grow
larger and reduce the overhead caused by frequent scans for stale entries.
UDP also uses a larger hash table size than the pre-patched code.
Reported by: wollman
Tested by: wollman (earlier version of patch)
Submitted by: ivoras (earlier patch)
Reviewed by: jhb (earlier version of patch)
MFC after: 1 month
real JBOD mode (SYS PD) would fail fairly reliably during I/O.
Steal the mfi_disk.c check for this condition (indirectly) when establishing
d_maxsize.
Reviewed by: ambrisko@
MFC after: 4 weeks
Sponsored by: Yahoo! Inc.
Previous bandaid was not appropriate and didn't really work for
all platforms. While here, cleanup the surrounding code to match
ffs_checkoverlap()
Reported by: dim, jmallet and bde
MFC after: 3 weeks
maintaining better LRU of active pages.
- Change v_free_target to include the quantity previously represented by
v_cache_min so we don't need to add them together everywhere we use them.
- Add a pageout_wakeup_thresh that sets the free page count trigger for
waking the page daemon. Set this 10% above v_free_min so we wakeup before
any phase transitions in vm users.
- Adjust down v_free_target now that we're willing to accept more pagedaemon
wakeups. This means we process fewer pages in one iteration as well,
leading to shorter lock hold times and less overall disruption.
- Eliminate vm_pageout_page_stats(). This was a minor variation on the
PQ_ACTIVE segment of the normal pageout daemon. Instead we now process
1 / vm_pageout_update_period pages every second. This causes us to visit
the whole active list every 60 seconds. Previously we would only maintain
the active LRU when we were short on pages which would mean it could be
woefully out of date.
Reviewed by: alc (slight variant of this)
Discussed with: alc, kib, jhb
Sponsored by: EMC / Isilon Storage Division
- Set NOTE_TRACKERR before running filt_proc(). If the knote did not
have NOTE_FORK set in fflags when registered, then the TRACKERR event
could miss being posted.
- Don't pass the pid in to filt_proc() for NOTE_FORK events. The special
handling for pids is done knote_fork() directly and no longer in
filt_proc().
MFC after: 2 weeks
Add definitions for e2fs_daddr_t, e4fs_daddr_t in addition
to the already existing e2fs_lbn_t and adjust them for ext4.
Other than making the code more readable these changes should
fix problems related to big filesystems.
Setting the proper types can be tricky so the process was
helped by looking at UFS. In our implementation, logical block
numbers can be negative and the code depends on it. In ext2,
block numbers are unsigned so it is convenient to keep
e2fs_daddr_t unsigned and use the complete 32 bits. In the
case of e4fs_daddr_t, while the value should be unsigned, for
ext4 we only need to support 48 bits so preserving an extra
bit from the sign is not an issue.
While here also drop the ext2_setblock() prototype that was
never used.
Discussed with: mckusick, bde
MFC after: 3 weeks
routines and thus assert if one passes in a rate code with the
high bit set.
Since the high bit can indicate either IEEE80211_RATE_BASIC or
IEEE80211_RATE_MCS, it's up to the caller to determine whether
the rate is 11n or not, and either mask out the BASIC bit, or
call a different function.
(Yes, this does mean that net80211 should grow 11n-aware rate2phytype()
and rate2plcp() functions..)
This may need to happen for the other drivers - it's currently only
done (now) for iwn(4) and bwi(4).
PR: kern/181100
extensions and also tried to be link time compatible with ports libiconv.
This splits that functionality and enables the parts that shouldn't
interfere with the port by default.
WITH_ICONV (now on by default) - adds iconv.h, iconv_open(3) etc.
WITH_LIBICONV_COMPAT (off by default) adds the libiconv_open etc API, linker
symbols and even a stub libiconv.so.3 that are good enough to be able
to 'pkg delete -f libiconv' on a running system and reasonably expect it
to work.
I have tortured many machines over the last few days to try and reduce
the possibilities of foot-shooting as much as I can. I've successfully
recompiled to enable and disable the libiconv_compat modes, ports that use
libiconv alongside system iconv etc. If you don't enable the
WITH_LIBICONV_COMPAT switch, they don't share symbol space.
This is an extension of behavior on other system. iconv(3) is a standard
libc interface and libiconv port expects to be able to run alongside it on
systems that have it.
Bumped osreldate.
probes declared in a kernel module when that module is unloaded. In
particular,
* Unloading a module with active SDT probes will cause a panic. [1]
* A module's (FBT/SDT) probes aren't destroyed when the module is unloaded;
trying to use them after the fact will generally cause a panic.
This change fixes both problems by porting the DTrace module load/unload
handlers from illumos and registering them with the corresponding
EVENTHANDLER(9) handlers. This allows the DTrace framework to destroy all
probes defined in a module when that module is unloaded, and to prevent a
module unload from proceeding if some of its probes are active. The latter
problem has already been fixed for FBT probes by checking lf->nenabled in
kern_kldunload(), but moving the check into the DTrace framework generalizes
it to all kernel providers and also fixes a race in the current
implementation (since a probe may be activated between the check and the
call to linker_file_unload()).
Additionally, the SDT implementation has been reworked to define SDT
providers/probes/argtypes in linker sets rather than using SYSINIT/SYSUNINIT
to create and destroy SDT probes when a module is loaded or unloaded. This
simplifies things quite a bit since it means that pretty much all of the SDT
code can live in sdt.ko, and since it becomes easier to integrate SDT with
the DTrace framework. Furthermore, this allows FreeBSD to be quite flexible
in that SDT providers spanning multiple modules can be created on the fly
when a module is loaded; at the moment it looks like illumos' SDT
implementation requires all SDT probes to be statically defined in a single
kernel table.
PR: 166927, 166926, 166928
Reported by: davide [1]
Reviewed by: avg, trociny (earlier version)
MFC after: 1 month
called after the module has been loaded, and the unload handlers are called
before the module is unloaded. Moreover, the module unload handlers may
return an error to prevent the unload from proceeding.
Reviewed by: avg
MFC after: 2 weeks
rather than just queueing. The former code was an attempt at getting
UDP performance up, but there have been customer reports of problems with it,
so the ixgbe approach seems the best solution for now.
command register. The lazy BAR allocation code in FreeBSD sometimes
disables this bit when it detects a range conflict, and will re-enable
it on demand when a driver allocates the BAR. Thus, the bit is no longer
a reliable indication of capability, and should not be checked. This
results in the elimination of a lot of code from drivers, and also gives
the opportunity to simplify a lot of drivers to use a helper API to set
the busmaster enable bit.
This changes fixes some recent reports of disk controllers and their
associated drives/enclosures disappearing during boot.
Submitted by: jhb
Reviewed by: jfv, marius, achadd, achim
MFC after: 1 day
the changes. Make sure that pci_alloc_msix() does give us the vectors
we need and fall back to MSI when it doesn't, also release any that
were allocated when insufficient.
MFC after: 3 days
Basic support for extents was implemented by Zheng Liu as part
of his Google Summer of Code in 2010. This support is read-only
at this time.
In addition to extents we also support the huge_file extension
for read-only purposes. This works nicely with the additional
support for birthtime/nanosec timestamps and dir_index that
have been added lately.
The implementation may not work for all ext4 filesystems as
it doesn't support some features that are being enabled by
default on recent linux like flex_bg. Nevertheless, the feature
should be very useful for migration or simple access in
filesystems that have been converted from ext2/3 or don't use
incompatible features.
Special thanks to Zheng Liu for his dedication and continued
work to support ext2 in FreeBSD.
Submitted by: Zheng Liu (lz@)
Reviewed by: Mike Ma, Christoph Mallon (previous version)
Sponsored by: Google Inc.
MFC after: 3 weeks
Linux targets without breaking the existing IOCTL API.
- Remove some not-needed header file inclusions.
- Wrap a long line.
MFC after: 1 week
Reported by: Damjan Jovanovic <damjan.jov@gmail.com>
what is really needed on this code snipped is that all the pages that
are already fully inserted gets fully freed, while for the others the
object removal itself might be skipped, hence the object might be set to
NULL.
Sponsored by: EMC / Isilon storage division
Reported by: alc, kib
Reviewed by: alc
* Break out the single, static RX context into a pointer, and ..
* .. extend it to two RX contexts - a default and a PAN context.
Whilst here, add a few extra fields in preparation for further iwn(4)
work.
Tested:
* Intel 4965, STA mode - same level of stability
* Intel 5100, STA mode - no change
Submitted by: Cedric Gross <cg@gross.info>
In the future, we'll need to come up with new proc_*() functions that accept
locked processes. For now, this prevents postgresql + DTrace from crashing the
system.
MFC after: 1 month
is operational. init_sleepqueues() initializes 256 mutexes, which,
due to witness still being cold, started to overflow the pending_locks
array.
As stated in the reported panic message, increase WITNESS_PENDLIST
from 768 to 1024, which provides space for additional 256 locks.
Reported by: many
Tested by: rakuco, bdrewery
unneeded checks for NULL, free(9) can handle NULL pointers on its own,
and the regions were allocated with M_WAITOK flag as well.
Reported and tested by: Larry Rosenman <ler@lerctr.org>
MFC after: 1 week
additional information, when the page is guaranteed to not belong to a
paging queue. Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.
Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked. See r141955 and
r253140 for examples.
Change the pageq member into a union containing explicitly-typed
members. Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().
Requested and reviewed by: alc
Sponsored by: The FreeBSD Foundation
- Use the right address when calling kva_free()
(Is there any reason why the s3c2xx0 comes with its own version of bs_map/
bs_unmap ? It seems to be just the same as in bus_space_generic.c)
random_adaptor is basically an adapter that plugs in to random(4).
random_adaptor can only be plugged in to random(4) very early in bootup.
Unplugging random_adaptor from random(4) is not supported, and is probably a
bad idea anyway, due to potential loss of entropy pools.
We currently have 3 random_adaptors:
+ yarrow
+ rdrand (ivy.c)
+ nehemeiah
* Remove platform dependent logic from probe.c, and move it into
corresponding registration routines of each random_adaptor provider.
probe.c doesn't do anything other than picking a specific random_adaptor
from a list of registered ones.
* If the kernel doesn't have any random_adaptor adapters present then the
creation of /dev/random is postponed until next random_adaptor is kldload'ed.
* Fix randomdev_soft.c to refer to its own random_adaptor, instead of a
system wide one.
Submitted by: arthurmesh@gmail.com, obrien
Obtained from: Juniper Networks
Reviewed by: so (des)
for nodes used in vm_radix.
On architectures supporting direct mapping, also avoid to pre-allocate
the KVA for such nodes.
In order to do so make the operations derived from vm_radix_insert()
to fail and handle all the deriving failure of those.
vm_radix-wise introduce a new function called vm_radix_replace(),
which can replace a leaf node, already present, with a new one,
and take into account the possibility, during vm_radix_insert()
allocation, that the operations on the radix trie can recurse.
This means that if operations in vm_radix_insert() recursed
vm_radix_insert() will start from scratch again.
Sponsored by: EMC / Isilon storage division
Reviewed by: alc (older version)
Reviewed by: jeff
Tested by: pho, scottl
Now the MTX_RECURSE flag can be passed to the mtx_*_flag() calls.
This helps in cases we want to narrow down to specific calls the
possibility to recurse for some locks.
Sponsored by: EMC / Isilon storage division
Reviewed by: jeff, alc
Tested by: pho
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.
Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag
The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.
Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl
- update powerpc/GENERIC64 as well, suggested by mdf
- update comments so that they make sense after the change, suggested by
jhb
X-MFC after: never (change specific to head)
bit25 of rxMode MAC register of 5762 needs to be set for rx mgmt
filter to work correctly when processing match for UDP header
fields. Otherwise false positive can occur which causes IPv4
fragment to be received by APE instead of host.
Reported by: Geans Pin <geanspin@broadcom.com>
Add a tunable that allows such a device to be excluded from the driver.
The id parameter is the target id that the driver assigns to a given device.
dev.mps.X.exclude_ids=<id>,<id>
Obtained from: Netflix
MFC after: 3 days
Illumos ZFS issues:
3957 ztest should update the cachefile before killing itself
3958 multiple scans can lead to partial resilvering
3959 ddt entries are not always resilvered
3960 dsl_scan can skip over dedup-ed blocks if
physical birth != logical birth
3961 freed gang blocks are not resilvered and can cause pool to suspend
3962 ztest should print out zfs debug buffer before exiting
match devices where the driver class was fixed but the unit number was
wildcarded. This better matches the documented behaviour in
DEVICE_PROBE(9).
Reviewed by: imp
the GEOM_PART. Instead of just using number of entries from the GPT
header, calculate this limit based on the reserved space between
GPT header and first available LBA.
MFC after: 2 weeks
(or svnliteversion) in the current lookup path is not what
was used to check out the tree. If an incompatible version
is used, the svn revision number is not reported in uname(1).
Run ${svnversion} on newvers.sh itself when evaluating if the
svn(1) in use is compatible with the tree. Fallback to an
empty ${svnversion} if necessary.
With this change, svnliteversion from base is only used
if no compatible svnversion is found, so with this change,
the version of svn(1) from the ports tree is evaluated first.
Requested by: many
MFC after: 3 days
X-MFC-To: stable/9, releng/9.2 only
I haven't yet reviewed the Intel driver(s) in more depth to see if
there are 1x1 NICs that report they support 2 transmit/receive chains..
if so then we'll have to update this.
Tested:
* Intel 4965, which is a 2x2 device with 3 RX and 2 TX chains.
PR: kern/181132
upper layer(s).
This eliminates the if_snd queue from net80211. Yay!
This unfortunately has a few side effects:
* It breaks ALTQ to net80211 for now - sorry everyone, but fixing
parallelism and eliminating the if_snd queue is more important
than supporting this broken traffic scheduling model. :-)
* There's no VAP and IC flush methods just yet - I think I'll add
some NULL methods for now just as placeholders.
* It reduces throughput a little because now net80211 will drop packets
rather than buffer them if the driver doesn't do its own buffering.
This will be addressed in the future as I implement per-node software
queues.
Tested:
* ath(4) and iwn(4) in STA operation
This is a cosmetic change but it does help with a proposed change to increase
the maximum size of physical memory supported on amd64 platforms.
Submitted by: Chris Torek (torek@torek.net)
Fix a regression introduced by fix for Illumos bug #3834. Quote from
Matthew Ahrens on the Illumos issue:
ztest fails this assertion because ztest_dmu_read_write() does
dmu_tx_hold_free(tx, bigobj, bigoff, bigsize);
and then
dmu_object_set_checksum(os, bigobj,
(enum zio_checksum)ztest_random_dsl_prop(ZFS_PROP_CHECKSUM), tx);
If the region to free is past the end of the file, the DMU assumes that there
will be nothing to do for this object. However, ztest does set_checksum(),
which must modify the dnode. The fix is for ztest to also call
dmu_tx_hold_bonus(tx, bigobj);
so we can account for the dirty data associated with setting the checksum
Illumos ZFS issues:
3955 ztest failure: assertion refcount_count(&tx->tx_space_written)
+ delta <= tx->tx_space_towrite
Merge vendor bugfix for ZFS test suite that triggers false positives.
Illumos ZFS issues:
3949 ztest fault injection should avoid resilvering devices
3950 ztest: deadman fires when we're doing a scan
3951 ztest hang when running dedup test
3952 ztest: ztest_reguid test and ztest_fault_inject don't place nice together
if NOTE_EXIT is not being monitored. The rationale is that a listener
should only get an event for exit() if they registered interest via
NOTE_EXIT. This matches the behavior on OS X.
- Don't save the exit status on process exit unless NOTE_EXIT is being
monitored.
- Add an internal EV_DROP flag that requests kqueue_scan() to free the
knote without signalling it to userland and use this when a process
exits but the fflags in the knote is zero.
Reviewed by: jmg
MFC after: 1 month
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.
Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
pvh_global_lock. This allows the method to be executed in parallel,
avoiding undue contention on the pvh_global_lock for the multithreaded
pagedaemon.
The pmap_ts_referenced() function has to inspect the page mappings for
several pmaps, which need to be locked while pv list lock is owned.
This contradicts to the lock order, where pmap lock is before pv list
lock. Introduce the generation count for the pv list of the page or
superpage, which indicate any change in the pv list, and, as usual,
perform restart of the iteration if generation changed while pv lock
was dropped for blocking acquire of a pmap lock.
Reported and tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
line boundary. It has never been 100% correct, and it can't work on SMP,
because nothing prevents another core from accessing data from an unrelated
buffer in the same cache line while we invalidated it. Just use bounce pages
instead.
Reviewed by: ian
Approved by: mux (mentor) (implicit)
Add support for A20 timer.
Correct interrupt offset depending from chip.
Add basic code for CPU configuration module.
For now, add kernel config and dts file
(only FDT blob related problem needs to be solved later in
order to have one kernel for both cubieboard1 and 2).
Approved by: ray@
KDB_TRACE is not an alternative to DDB/etc, they are complementary.
So I do not see any reason to not enable KDB_TRACE by default.
X-MFC after: never (change specific to head)
transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division
being defined in <sys/diskmbr.h>. Instead give the symbols here a
"PC98_" prefix. This way, both <sys/diskmbr.h> and <sys/diskpc98.h>
can be included in the same C source file.
The renaming is trivial. The only gotcha is that DOSBBSECTOR is
also redefined from 0 to 1. This because DOSBBSECTOR was always
used in conjunction with an addition of 1. The PC98_BBSECTOR symbol
is defined as 1 and the expression is simplified.
Note: it is not believed that ports are seriously impacted; or at
all for that matter.
Approved by: nyan@
controller supports only a single message. I haven't seen such an adapter
out in the wild, though, so this change likely is a NOP.
While at it, further simplify the MSI allocation logic; there's no need
to check the number of available messages on our own as pci_alloc_msi(9)
will just fail if it can't provide us with the single message we want.
- Nuke the unused softc of aacch(4).
MFC after: 1 month
quirk and apply it to these controllers [1]. The same problem was reported
for 2230S, in which case it wasn't actually clear whether the culprit is the
controller or the mainboard, though. In order to be on the safe side, flag
MSIs as being broken with the latter type of controller as well. Given that
these are the only reports of MSI-related breakage with aac(4) so far and
OSes like OpenSolaris unconditionally employ MSIs for all adapters of this
family, however, it doesn't seem warranted to generally disable the use of
MSIs in aac(4).
While it, simplify the MSI allocation logic a bit; there's no need to check
for the presence of the MSI capability on our own as pci_alloc_msi(9) will
just fail when these kind of interrupts are not available.
Reported and tested by: David Boyd [1]
MFC after: 3 days
collision between a rename and an open system call for the same
target file. Here, rename releases its vnode references, waits for
the open to finish, and then restarts by reacquiring its needed
vnode locks. In this case, rename was unlocking but failing to
release its reference to one of its held vnodes. The effect was
that even after all the actual references to the vnode had gone,
the vnode still showed active references. For files that had been
removed, their space was not reclaimed until the filesystem was
forcibly unmounted.
This bug manifested itself in the Postgres server which would
leak/lose hundreds of files per day amounting to many gigabytes of
disk space. This bug required shutting down Postgres, forcibly
unmounting its filesystem, remounting its filesystem and restarting
Postgres every few days to recover the lost space.
Reported by: Dan Thomas and Palle Girgensohn
Bug-fix by: kib
Tested by: Dan Thomas and Palle Girgensohn
MFC after: 2 weeks
Do this by forcing inclusion of
sys/cddl/compat/opensolaris/sys/debug_compat.h
via -include option into all source files from OpenSolaris.
Note that this -include option must always be after -include opt_global.h.
Additionally, remove forced definition of DEBUG for some modules and fix
their build without DEBUG.
Also, meaning of DEBUG was overloaded to enable WITNESS support for some
OpenSolaris (primarily ZFS) locks. Now this overloading is removed and
that use of DEBUG is replaced with a new option OPENSOLARIS_WITNESS.
MFC after: 17 days
to get the semantics when setting the PMAP right. Prior to r251782, the
latter already used implicit acquire semantics, which - currently - means
to not employ additional explicit memory barriers under the hood (see also
r225889).
Before executing any subcommand, zpool tool fetches pools configuration from
the kernel. Before features support was added, kernel was regenerating that
configuration based on data always present in memory. Unfortunately, pool
features list and activity counters are not such. They are stored in ZAP,
that normally resides in ARC, but under heavy memory pressure may be swapped
out. If pool is suspended at this point, there is no way to recover it back
since any zpool command will stuck.
This change has one predictable flaw: `zpool upgrade` always wish to upgrade
suspended pools, but fortunately it can't do it due to the suspension.
disable TRIM otherwise.
r252840 (illumos bug 3836) is based on assumption that zio_free_sync() has
no lock dependencies and should complete immediately. Unfortunately, with our
TRIM implementation that is not true due to ZIO_STAGE_VDEV_IO_START added
to the ZIO_FREE_PIPELINE, which, while not really accessing devices, still
acquires SCL_ZIO lock for read to be sure devices won't disappear.
When TRIM is disabled, this patch enables direct free execution from r252840
and removes ZIO_STAGE_VDEV_IO_START and ZIO_STAGE_VDEV_IO_ASSESS stages from
the pipeline to avoid lock acquisition. Otherwise it queues free request as
it was before r252840.
Existing async thread is running only on successfull spa_sync() completion,
that is impossible in case of pool loosing required (last) disk(s). That
indefinite delay of SPA_ASYNC_REMOVE processing made ZFS to not close the
lost disks, preventing GEOM/CAM from destroying devices and reusing names
on later disk reattach.
In earlier version of the patch I've tried to just run existing thread
immediately, unrelated to spa_sync() completion, but that exposed number
of situations where it could stuck due to locks held by stuck spa_sync(),
that are required for other kinds of async events.
Experiments with OpenIndiana snapshot confirmed that they also have this
issue with lost disks reattach.
PID is valid for monitoring in FILEMON_SET_PID ioctl.
- Set the monitored PID to -1 when the process exits.
Suggested by: jilles
Tested by: sjg
MFC after: 3 days
persist much longer than previously. Historically we had at most 100
entries; now the count may reach a million. With the increased count
we spent far too much time looking them up in the grossly undersized
newblk hash table. Configure the newblk hash table to accurately reflect
the number of entries that it must index.
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
we need to collect the highest level of allocation for each of the
different soft update dependency structures. This change collects these
statistics and makes them available using `sysctl debug.softdep.highuse'.
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
PF_INET6 in kernel. This fixes various malfunction when the wall time
clock is changed. Bump __FreeBSD_version to 1000041.
- Use clock_gettime(CLOCK_MONOTONIC_FAST) in userland utilities.
MFC after: 1 month
because an exception may happen at any time. The stack alignment rules on
ARM EABI state the only place the stack must be 8-byte aligned is on a
function boundary.
If an exception happens while a function is setting up or tearing down it's
stack frame it may not be correctly aligned. There is also no requirement
for it to be when the function is a leaf node.
The fix is to align the stack after we have stored a backup of the old stack
pointer, but before we have stored anything in the trapframe. Along with
this we need to adjust the size of the trapframe by 4 bytes to ensure the
stack below it is also correctly aligned.
in particular, from the tmpfs_lookup VOP method. If LK_NOWAIT is not
specified in the lkflags, the lookup is supposed to return an alive
vnode whenever the underlying node is valid.
Currently, the tmpfs_alloc_vp() returns ENOENT if the vnode attached
to node exists and is being reclaimed. This causes spurious ENOENT
errors from lookup on tmpfs and corresponding random 'No such file'
failures from syscalls working with tmpfs files.
Fix this by waiting for the doomed vnode to be detached from the tmpfs
node if sleepable allocation is requested.
Note that filesystems which use vfs_hash.c, correctly handle the case
due to vfs_hash_get() looping when vget() returns ENOENT for sleepable
requests.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Change CCB queue resize logic to be able safely handle overallocations:
- (re)allocate queue space in power of 2 chunks with 64 elements minimum
and never shrink it; with only 4/8 bytes per element size is insignificant.
- automatically reallocate the queue to double size if it is overflowed.
- if queue reallocation failed, store extra CCBs in unsorted TAILQ,
fetching them back as soon as some queue element is freed.
To free space in CCB for TAILQ linking, change highpowerq from keeping
high-power CCBs to keeping devices frozen due to high-power CCBs.
This encloses all pieces of queue resize logic inside of cam_queue.[ch],
removing some not obvious duties from xpt_release_ccb().
the tree version, for example if the tree is checked out with an
outdated svn from ports, but the base system svnlite is built.
Approved by: kib (mentor)
We cannot busy a page before doing pagefaults.
Infact, it can deadlock against vnode lock, as it tries to vget().
Other functions, right now, have an opposite lock ordering, like
vm_object_sync(), which acquires the vnode lock first and then
sleeps on the busy mechanism.
Before this patch is reinserted we need to break this ordering.
Sponsored by: EMC / Isilon storage division
Reported by: kib
huge pages in the kernel's address space. This works around several
asserts from pmap_demote_pde_locked that did not apply and gave false
warnings.
Discovered by: pho
Reviewed by: alc
Sponsored by: EMC / Isilon Storage Division
- It does not let pages respect the LRU policy
- It bloats the active/inactive queues of few pages
Try to avoid it as much as possible with the long-term target to
completely remove it.
Use the soft-busy mechanism to protect page content accesses during
short-term operations (like uiomove_fromphys()).
After this change only vm_fault_quick_hold_pages() is still using the
hold mechanism for page content access.
There is an additional complexity there as the quick path cannot
immediately access the page object to busy the page and the slow path
cannot however busy more than one page a time (to avoid deadlocks).
Fixing such primitive can bring to complete removal of the page hold
mechanism.
Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff
Tested by: pho
kern_sendfile() which is unnecessary.
The page is already wired so it will not be subjected to pagefault.
The content cannot be effectively protected as it is full of races
already.
Multiple accesses to the same indexes are serialized through vn_rdwr().
Sponsored by: EMC / Isilon storage division
Reviewed by: alc, jeff
Tested by: pho
Instead of hard-coding the uart register addresses for the imx51, use
a variable that defaults to the imx51 address. When debugging another
imx-family SoC, the variable can be set early in initarm() to provide
full console/printf support for debugging early boot.
architectural state on CR vmexits by guaranteeing
that EFER, CR0 and the VMCS entry controls are
all in sync when transitioning to IA-32e mode.
Submitted by: Tycho Nightingale (tycho.nightingale <at> plurisbusnetworks.com)
this is is crucial at least for the latter.
What happens is that attaching uart(4) to scc(4) causes the SAB 82532 to
"receive" something and trigger a SER_INT_RXREADY interrupt, given that
at least fast/filter interrupts are already enabled. Prior to r253161,
uart_bus_ihand() was set up at this point and handled that condition,
i. e. read the RX FIFO and issued a Receive Message Complete.
Now, uart_bus_ihand() and uart_intr() are setup after attaching uart(4),
leaving the SER_INT_RXREADY interrupt triggered during the latter to
be handled by the iclear method. However, with that method not implement,
this in turn causes SAB 82532 to not issue any further SER_INT_RXREADY
interrupts until the RX FIFO is full again. Thus, 15 received bytes go
to nowhere, given that "the other half" of the RX FIFO is used for status
information. Hence, implementing sab82532_bfe_iclear() fixes things again.
Potentially, the same problem exists for QUICC.
- Remove unnecessary __RMAN_RESOURCE_VISIBLE.
- Remove a superfluous header.
- Use KOBJMETHOD_END.
- Mark unused arguments as such.
- Remove variables unused after initialization.
Reviewed by: marcel (earlier version)
IDs for new devices.
* Add new device IDs
* Extend the ID probe code to include the newer range of bits used
by later model devices
Tested:
* Intel 5100, STA mode
TODO:
* Test on Intel 4965, just to be sure
Submitted by: Cedric GROSS <cg@gross.info>
* Add in some new register debugging under IWN_DEBUG_REGISTER
* Make IWN_DEBUG an option now for building. I'll chase this up
with a commit to 'options' soon.
Submitted by: Cedric GROSS <cg@cgross.info>
- mbuf reused after an RX_COPY optimized operation can sometimes have
a bogus cached address, resulting in TCP hangs. Add critical save points
to the cached address. Thanks to Michael and the team at Verisign for
finding this problem.
- A couple more spots where the rxbuf->flags member should be cleared just
to be sure no incorrect RX_COPY state is left around. Thanks to Adrian
for tracking these down.
- Remove the rearm_queues function from the driver, this was found to be
responsible for some out-of-order packets by Verisign, and was always a
bandaid, with the other fixes in this delta the bandaid can finally be
removed.
- In the other/link interrupt handler the entire state of the EICS register
was being writen back into EICR (which clears causes and thus re-enables
those interrupts), this was wrong, so now mask off the queue portion of
the register value, so we only clear the other/link interrupt we intend.
Marc from Verisign found this.
- Make the SFP+ unsupported option tuneable now, by customer request.
- Finally, just a couple of minor DEBUG string fixes.
I want to call out and thank all the participants in the 10G community/Intel
calls for helping track down these problems and make the driver better for
everyone!
MFC after: 3 days, these are critical fixes for 9.2!
interpreter that can be run on the system and as such cannot be
compiled against libbstand. On the one hand this means we need to
include the usual headers for system interfaces that we use and
on the the other hand we can only use standard system interfaces.
While here, define local variables only when needed to make this
WARNS=2 clean on amd64.
PR: 172542
Obtained from: peterj@
Pointed out by: Jan Beich <jbeich@tormail.org>
The htree implementation uses code derived from the
RSA Data Security, Inc. MD4 Message-Digest Algorithm.
Add a proper licensing statement for the code and clarify
the corresponding comments.
Approved by: core (hrs)
of unloading the module while VMs existed. This would
result in EBUSY, but would prevent further operations
on VMs resulting in the module being impossible to
unload.
Submitted by: Tycho Nightingale (tycho.nightingale <at> plurisbusnetworks.com)
Reviewed by: grehan, neel
This was exposed with AP spinup of Linux, and
booting OpenBSD, where the CR0 register is unconditionally
written to prior to the longjump to enter protected
mode. The CR-vmexit handling was not updating CPU state which
resulted in a vmentry failure with invalid guest state.
A follow-on submit will fix the CPU state issue, but this
fix prevents the CR-vmexit prior to entering protected
mode by properly initializing and maintaining CR* state.
Reviewed by: neel
Reported by: Gopakumar.T @ netapp
other than the one specified by the BOOTP server. This configures NFS
using the BOOTP protocol while also respecting other root-path options such
as setting vfs.root.mountfrom in the environment or using the RB_DFLTROOT
boot option. It allows you to override the root path provided by the
server, or to supply a root path when the server provides IP configuration
but no root path info.
This maintains the historical BOOTP_NFSROOT behavior of panicking on a
failure to mount the root path provided by the server, unless you've
provided an alternative via the ROOTDEVNAME kernel option or by setting
vfs.root.mountfrom. The behavior of panicking when given no other options
is preserved because it amounts to a bit of a retry loop that could
eventually recover from a transient network or server problem.
The user can now override the root path from loader(8) even if the
kernel is compiled with BOOTP_NFSROOT. If vfs.root.mountfrom is set in
the environment it is used unconditionally -- it always overrides the
BOOTP info. If it begins with [old]nfs: then the BOOTP code uses it
instead of the server-provided info. If it specifies some other
filesystem then the bootp code will not panic like it used to and the code
in vfs_mountroot.c will invoke the right filesystem to do the mount.
If the kernel is compiled with the ROOTDEVNAME option, then that name is
used by the BOOTP code if either
* The server doesn't provide a pathname.
* The boothowto flags include RB_DFLTROOT.
The latter allows the user to compile in alternate path in ROOTDEVNAME
such as ufs:/dev/da0s1a and boot from that path by setting
boot_dftlroot=1 in loader(8) or using the '-r' option in boot(8).
The one thing not provided here is automatic failover from a
server-provided path to a compiled-in one without the user manually
requesting that. The code just isn't currently structured in a way that
makes that possible with a lot of rewrite. I think the ability to set
vfs.root.mountfrom and to use ROOTDEVNAME automatically when the server
doesn't provide a name covers the most common needs.
A set of patches submitted by Lars Eggert provided the part I couldn't
figure out by myself when I tried to do this last year; many thanks.
Reviewed by: rodrigc
Skip eviction step of processing free records when doing ZFS
receive to avoid the expensive search operation of non-existent
dbufs in dn_dbufs.
Illumos ZFS issues:
3834 incremental replication of 'holey' file systems is slow
MFC after: 2 weeks
To quote Illumos issue #3888:
When 'zfs recv -F' is used with an incremental recv it rolls
back any changes made since the last snapshot in case new
changes were made to the file system while the recv is in
progress (without -F the recv would fail when it does it's
final check to commit the recv-ed data as the recv-ed data
conflicts with the newly written data).
However, if there is a snapshot taken after the recv began
rolling back to the 'latest' snapshot will not help and the
recv will still fail. 'zfs recv -F' should be extended to
destroy any snapshots created since the source snapshot when
finishing the recv (effectively rolling back through all
snapshots, instead of just to the latest snapshot).
Illumos ZFS issues:
3888 zfs recv -F should destroy any snapshots created since the
incremental source
MFC after: 2 weeks