makefs(1) has a number of signedness warnings (when built with higher
WARNS), most of which can be addressed by careful application of casts
in makefs itself.
There is one case where a signedness warning arises from the blksize
macro, so must be addressed in the macro itself.
Reviewed by: kib, mckusick
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10589
Background:
The same VM object might be shared by multiple processes and the
mm_struct is usually freed when a process exits.
Grab a reference on the mm_struct while the vmap is in the
linux_vma_head list in case the first process which inserted a VM
object has exited.
Tested by: kwm @
MFC after: 1 week
Sponsored by: Mellanox Technologies
operate similarly, other than not needing the delayed invalidation.
It has been tested with artificial injection of vm_page_alloc failures
while running 'sort /dev/zero'.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D10574
Increase hw.psm.synaptics.min_pressure default value from 16 to 32
to nearly match Linux driver (30-35 hysteresis loop).
This makes libinput tap detection more reliable.
Reviewed by: gonzo
Approved by: gonzo (mentor)
MFC after: 2 weeks
It is already included via sys/systm.h
Reviewed by: gonzo
Approved by: gonzo (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10266
models. This change is based on Linux driver.
Determine logical trace size. It used for calculation of touch sizes
in surface units for MT-protocol type B evdev reports.
Reviewed by: gonzo
Approved by: gonzo (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10266
Linux does. It should not affect gesture processing in current state as it
ignores finger coords on 3-finger tap detection but it should make evdev
reports looking more Linux-alike.
Reviewed by: gonzo
Approved by: gonzo
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10266
This is done with discarding pointer movements rather then mouse packets
Reviewed by: gonzo
Approved by: gonzo (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10266
Elan hw v.4 touchpads often sends touchpad release packet right after
touchpad touch one. Most probably this happens due to PS/2 limited bandwith.
Reducing of tap_min_queue size to 1 makes multifinger tap detection
more reliable in this case.
Reviewed by: gonzo
Approved by: gonzo (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10266
Wait for all advertised head packets after status packet have been received.
This fixes rare but quite annoying issue in Elan hw v.4 touchpads support
when triple-finger taps are reported as double-finger taps under several
circumstances.
Reviewed by: gonzo
Approved by: gonzo (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10266
races at least in its ioctl handler, and at the same time it creates
device entry with 0666 permissions.
To plug possible issues in it:
- Mark it as needing Giant.
- Switch device mode to 0600.
Submitted by: C Turt
Reviewed by: imp
MFC after: 1 week
Security: Possible double free in ioctl handler
sys/cam/scsi/scsi_all.h:
Add the SCSI Solid State Media log page (0x11) structure
definition. This gives the percentage used (in terms of
lifetime flash wear) of an SSD.
MFC after: 3 days
Sponsored by: Spectra Logic
From user report, the errors are seen:
error 1
error 1
gptzfsboot: error 1 lba 4294967288
gptzfsboot: error 1 lba 1
gptzfsboot: no ZFS pools located, can't boot
The first two errors above are from issuing INT13 EAX=0x4800, meaning we
need to check if EDD is available and use EAX=0x800 if not.
For an workaround I'm using the similar idea as in biosdisk.c - first probe
ah=8h, then check if we have EDD.
Note we would like to see the correct disk size info, but we *may*
get away with anything >64MB, so we could at least test 2 zfs pool labels
on whole disk setup and not to freak out the INT13 interface.
If we get away with initial disk probing, then we have partition sizes from
the partition table and we should be able to complete the disk probing.
Note: this update does not provide full fix to all errors, but we get
the drvsize() errors removed.
Reported by: Michael W. Lucas
Reviewed by: julian
Differential Revision: https://reviews.freebsd.org/D10591
Extended attributes and their particular implementation in linux are
different from FreeBSD so in this case we have started diverging from
the UFS EA implementation, which would be the natural reference.
Depending on future progress implementing ACLs this approach may change
but for now bring to the tree an implementation that is consistent and
can be tested.
Submitted by: Fedor Uporov
Differential Revision: https://reviews.freebsd.org/D10460
After FreeBSD SVN revision 236814, the pass(4) driver changed from
only doing error recovery when the CAM_PASS_ERR_RECOVER flag was
set on a CCB to sometimes doing error recovery if the passed in
retry count was non-zero.
Error recovery would happen if two conditions were met:
1. The error recovery action was simply a retry. (Which is most
cases.)
2. The retry_count is non-zero. (Which happened a lot because of
cut-and-pasted code.)
This explains a bug I noticed in with camcontrol:
# camcontrol tur da34 -v
Unit is ready
# camcontrol reset da34
Reset of 1:172:0 was successful
At this point, there should be a Unit Attention:
# camcontrol tur da34 -v
Unit is ready
No Unit Attention.
Try it again:
# camcontrol reset da34
Reset of 1:172:0 was successful
Now set the retry_count to 0 for the TUR:
# camcontrol tur da34 -v -C 0
Unit is not ready
(pass42:mps1:0:172:0): TEST UNIT READY. CDB: 00 00 00 00 00 00
(pass42:mps1:0:172:0): CAM status: SCSI Status Error
(pass42:mps1:0:172:0): SCSI status: Check Condition
(pass42:mps1:0:172:0): SCSI sense: UNIT ATTENTION asc:29,2 (SCSI bus reset occurred)
(pass42:mps1:0:172:0): Field Replaceable Unit: 2
There is the unit attention. camcontrol(8) has a default
retry_count of 1, in case someone sets the -E flag without
setting -C.
The CAM_PASS_ERR_RECOVER behavior was only broken with the
CAMIOCOMMAND ioctl, which is the synchronous pass(4) API. It has
worked as intended (error recovery is only done when the flag
is set) in the asynchronous API (CAMIOQUEUE ioctl).
sys/cam/scsi/scsi_pass.c:
In passsendccb(), when calling cam_periph_runccb(), only
specify the error routine when CAM_PASS_ERR_RECOVER is set.
share/man/man4/pass.4:
Document that CAM_PASS_ERR_RECOVER is needed to enable
error recovery.
Reported by: Terry Kennedy <TERRY@glaver.org>
PR: kern/218572
MFC after: 1 week
Sponsored by: Spectra Logic
vnet_pf_uninit() is called through vnet_deregister_sysuninit() and
linker_file_unload() when the pf module is unloaded. This is executed
after pf_unload() so we end up trying to take locks which have been
destroyed already.
Move pf_unload() to a separate SYSUNINIT() to ensure it's called after
all the vnet_pf_uninit() calls.
Differential Revision: https://reviews.freebsd.org/D10025
Add IRQ placement-only and ithread-only API variants. intr_event_bind
has been extended with sibling methods, as it has many more callsites in
existing code.
Reviewed by: kib@, adrian@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D10586
sys/cam/scsi/scsi_all.c:
In the asc_table, if we get a 0x20,0x02 error ("Access denied -
no access rights"), don't bother retrying. Instead, immediately
fail the command.
This is the error returned by Self Encrypting Drives (SED) when
they are locked.
MFC after: 3 days
Sponsored by: Spectra Logic
Prior to this change, the CRN (Command Reference Number) is reset on any
firmware LIP, LOOP DOWN, or LOOP RESET event in violation of FCP-4 which
specifies that the CRN should only be reset in response to a LIP Reset
(LIPyx) primitive. FCP-4 also indicates PLOGI/LOGO and PRLI/PRLO ELS
actions as conditions for resetting the CRN for the associated initiator
port.
These violations manifest themselves when the HBA is removed from the
loop, or a target device is removed (especially during an outstanding
command) without power cycling. If the HBA and and the target device
determine upon re-establishing the loop that no PLOGI or PRLI is
required, and the target does not issue a LIPxy to the initiator, the
CRN for the target will have been improperly reset by the isp driver. As
a result, the target port will silently ignore all FCP commands issued
during the device probe (which will time out) preventing the device from
attaching.
This change corrects thie CRN reset behavior in response to loop state
changes, also introduces CRN resets for the above mentioned ELS actions
as encountered through async PDB change events.
This change also adds cleanup of outstanding commands in isp_loop_dead()
that was previously missing.
sys/dev/isp/isp.c
Add the last login state to debug output when syncing the pdb
sys/dev/isp/isp_freebsd.c
Replace binary statement setting aborted ccb status in
isp_watchdog() with the XS_SETERR macro used elsewhere
In isp_loop_dead(), abort or complete pending commands as done
in isp_watchdog()
In isp_async(), segregate the ISPASYNC_LOOP_RESET action from
ISPASYNC_LIP, ISPASYNC_LOOP_DOWN, and ISPASYNC_LOOP_UP
fallthroughs, and only reset the CRN in the RESET case. Also add
checks to handle false LOOP RESET actions that do not have a
proper associated LIP primitive, and log the primitive in the
debug messages
In isp_async(), remove the goto from ISP_ASYNC_DEV_STAYED, and
only reset the CRN in the DEV_CHANGED action
In isp_async(), when processing an ISPASYNC_CHANGE_PDB status,
reset CRN(s) for the associated nphdl (or all ports) if the
change reason is some form of ELS login/logout. Also remove
assignment to fc since it is not used in the scope
sys/dev/isp/ispmbox.h
Add macro definition for the global N-Port handle, and correct a
macro typo 'PDB24XX_AE_PRLI_DONJE'
sys/dev/isp/ispvar.h
Add macros FCP_AL_DA_ALL, FCP_AL_PA, and FCP_IS_DEST_ALPD for
more legible code when determining if an AL_PD port matches the
portid for a given struct fcparam* by value or by virtue of the
AL_PD port being 0xFF
Submitted by: Reid Linnemann
Sponsored by: Spectra Logic
MFC after: 1 week
During gmirror startup, if component mirrors are found to be dirty as is
typical after a system crash, the mirrors are synchronized to the mirror
with highest priority. However if a gmirror starts without all of its
mirrors present, for example because of some transient delays during
tasting, the remaining mirrors must be synchronized before they may become
active.
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
- Create a new file, t4_sched.c, and move all of the code related to
traffic management from t4_main.c and t4_sge.c to this file.
- Track both Channel Rate Limiter (ch_rl) and Class Rate Limiter (cl_rl)
parameters in the PF driver.
- Initialize all the cl_rl limiters with somewhat arbitrary default
rates and provide routines to update them on the fly.
- Provide routines to reserve and release traffic classes.
MFC after: 1 month
Sponsored by: Chelsio Communications
Before this change if_lagg was using nonsleepable rmlocks to protect its
internal state. This patch introduces another sx lock to protect code
paths that require sleeping, while still uses old rmlock to protect hot
nonsleepable data paths.
This change allows to remove taskqueue decoupling used before to change
interface addresses without holding the lock. Instead it uses sx lock to
protect direct if_ioctl() calls.
As another bonus, the new code synchronizes enabled capabilities of member
interfaces, and allows to control them with ifconfig laggX, that was
impossible before. This part should fix interoperation with if_bridge,
that may need to disable some capabilities, such as TXCSUM or LRO, to allow
bridging with noncapable interfaces.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D10514
This restores 32bit-sized accesses to vmcnt sysctls, making old
binaries like top(1), systat(8) and reboot(8) mostly functional on
newer kernel.
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
At least with Tx FIFO enabled it shows me ~10% reduction of verbose boot
time with serial console at 115200 baud.
Reviewed by: marcel
MFC after: 2 weeks
The current reboot command in efi/loader/main.c is passing extra data with
ResetSystem, however, UEFI spec 2.6, page 265 does state:
"ResetData is only valid if ResetStatus is something other than EFI_SUCCESS
unless the ResetType is EfiResetPlatformSpecific where a minimum amount of
ResetData is always required."
Therefore we should use DataSize 0 and ResetData NULL - those are two last
arguments for the call.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D10562
cannot access the GLOBAL2 register directly.
Despite the comment in code (which was misleading), the indirect access is
only used to read the switch CONFIG data from the scrap register and not
for the GLOBAL2 access.
Use the dsa data to define when the switch is in the Multi Chip Addressing
Mode (a even address different than zero).
While here fix a typo.
Sponsored by: Rubicon Communications, LLC (Netgate)
This patch allows the MTU stored in the hostcache to be used as an
initial value for SCTP paths. When an ICMP PTB message is received,
store the MTU in the hostcache.
MFC after: 1 week
determining the softc of the bridge in psycho_route_interrupt(). [1]
While at it, update the corresponding comment that the code in
question is also necessary for U30s in addition to E450s (a fact
that has been known for ages).
PR: 218478
Submitted by: Yoshihiko Iwama
The code specified the length of a layout as INT64_MAX instead of
UINT64_MAX. This could result in getting a layout for less than the
full file for extremely large files. Although having little practical
effect, this patch corrects this in the code.
Detected during recent testing of the pNFS server.
MFC after: 2 weeks
In exceptional circumstances, an MCA exception will trigger when the
freelist is exhausted. In such a case, no error will be logged on the list
and 'mca_count' will not be incremented.
Prior to this patch, all CPUs that received the exception would spin
forever.
With this change, the CPU that detects the error but finds the freelist
empty will proceed to panic the machine, ending the deadlock.
A follow-up to r260457.
Reported by: Ryan Libby <rlibby at gmail.com>
Reviewed by: jhb@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D10536
It improves interoperability with if_bridge, which may need to disable
some capabilities not supported by other members. IMHO there is still
open question about LRO capability, which may need to be disabled on
physical interface.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
7740 fix for 6513 only works in hole punching case, not truncation
illumos/illumos-gate@7de35a3ed07de35a3ed0https://www.illumos.org/issues/7740
The problem is that dbuf_findbp will return ENOENT if the block it's
trying to find is beyond the end of the file. If that happens, we assume
there is no birth time, and so we lose that information when we write
out new blkptrs. We should teach dbuf_findbp to look for things that are
beyond the current end, but not beyond the absolute end of the file.
To verify, create a large file, truncate it to a short length, and then
write beyond the end. Check with zdb to make sure that there are no
holes with birth time zero (will appear as gaps).
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Paul Dagnelie <pcd@delphix.com>
7743 per-vdev-zaps have no initialize path on upgrade
illumos/illumos-gate@555da5111b555da5111bhttps://www.illumos.org/issues/7743
When loading a pool that had been created before the existance of
per-vdev zaps, on a system that knows about per-vdev zaps, the
per-vdev zaps will not be allocated and initialized.
This appears to be because the logic that would have done so, in
spa_sync_config_object(), is not reached under normal operation. It is
only reached if spa_config_dirty_list is non-empty.
The fix is to add another `AVZ_ACTION_` enum that will allow this code
to be reached when we detect that we're loading an old pool, even when
there are no dirty configs.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
7613 ms_freetree[4] is only used in syncing context
illumos/illumos-gate@5f145778015f14577801https://www.illumos.org/issues/7613
metaslab_t:ms_freetree[TXG_SIZE] is only used in syncing context. We should
replace it with two trees: the freeing tree (ranges that we are freeing this
syncing txg) and the freed tree (ranges which have been freed this txg).
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
Some notes:
- Only i386 and amd64 layouts are checked, other Tier-1 (or close to
it) architectures would benefit from the same check.
- Unconditional enabling of the asserts depend on the stability of locks
memory layout. If locks are optimized to avoid bloat when some debugging
or profiling features turned off, it makes sense to only assert layout
for production configs.
Reviewed by: badger, emaste, jhb, vangyzen
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D10526
7586 remove #ifdef __lint hack from dmu.h
illumos/illumos-gate@4ba5b961634ba5b96163https://www.illumos.org/issues/7586
The #ifdef __lint in dmu.h is ugly, and it would be nice not to duplicate it if
we add other inline functions into header files in ZFS, especially since it is
difficult to make any other solution work across all compilation targets. We
should switch to disabling the lint flags that are failing instead.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
available and if that is true make use of them.
Thank you very much to Andrew Turner for providing help and review the patch!
Reviewed by: andrew
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10499
7580 ztest failure in dbuf_read_impl
illumos/illumos-gate@1a01181fdc1a01181fdchttps://www.illumos.org/issues/7580
We need to prevent any reader whenever we're about the zero out all the
blkptrs. To do this we need to grab the dn_struct_rwlock as writer in
dbuf_write_children_ready and free_children just prior to calling bzero.
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: George Wilson <george.wilson@delphix.com>
- Rename the default implementation of 'pcib_request_feature' and add
a pcib_request_feature() wrapper function (as is often done for
new-bus APIs implemented via kobj) that accepts a single function.
Previously the call to pcib_request_feature() ended up invoking the
method on the great-great-grandparent of the bridge device instead
of the grandparent. For a bridge that was a direct child of pci0 on
x86 this resulted in the method skipping over the Host-PCI bridge
driver and being invoked against nexus0
- When invoking _OSC from a Host-PCI bridge driver, invoke
device_get_softc() against the Host-PCI bridge device instead of the
child bridge that is requesting HotPlug. Using the wrong softc data
resulted in garbage being passed for the ACPI handle causing the
_OSC call to fail.
- While here, perform some other cleanups to _OSC handling in the ACPI
Host-PCI bridge driver:
- Don't invoke _OSC when requesting a control that has already been
granted by the firmware.
- Don't set the first word of the capability array before invoking
_OSC. This word is always set explicitly by acpi_EvaluateOSC()
since it is UUID-independent.
- Don't modify the set of granted controls unless _OSC doesn't exist
(which is treated as always successful), or the _OSC method
doesn't fail.
- Don't require an _OSC status of 0 for success. _OSC always
returns the updated control mask even if it returns a non-zero
status in the first word.
- Whine if _OSC ever tries to revoke a previously-granted control.
(It is not supposed to do that.)
- While here, add constants for the _OSC status word in acpivar.h
(though currently unused).
Reported by: adrian
Reviewed by: imp
MFC after: 1 week
Tested on: Lenovo x220
Differential Revision: https://reviews.freebsd.org/D10520
7606 dmu_objset_find_dp() takes a long time while importing pool
illumos/illumos-gate@7588687e6b7588687e6bhttps://www.illumos.org/issues/7606
When importing a pool with a large number of filesystems within the same
parent filesystem, we see that dmu_objset_find_dp() takes a long time.
It is called from 3 places: spa_check_logs(), spa_ld_claim_log_blocks(),
and spa_load_verify().
There are several ways to improve performance here:
1. We don't really need to do spa_check_logs() or
spa_ld_claim_log_blocks() if the pool was closed cleanly.
2. spa_load_verify() uses dmu_objset_find_dp() to check that no
datasets have too long of names.
3. dmu_objset_find_dp() is slow because it's doing
zap_value_search() (which is O(N sibling datasets)) to determine
the name of each dsl_dir when it's opened. In this case we
actually know the name when we are opening it, so we can provide
it and avoid the lookup.
This change implements fix#3 from the above list; i.e. make
dmu_objset_find_dp() provide the name of the dataset so that we don't
have to search for it.
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <prashksp@gmail.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Matthew Ahrens <mahrens@delphix.com>
linux_page_address() function in the LinuxKPI. This solves an issue
where the return value from linux_page_address() is passed to
kmem_free().
MFC after: 1 week
Sponsored by: Mellanox Technologies
The nfsv4_seqsession() call returns NFSERR_REPLYFROMCACHE when it has a
reply in the session, due to a requestor retry. The code erroneously
assumed a return of 0 for this case. This patch fixes this and adds
a KASSERT(). This would be an extremely rare occurrence. It was found
during code inspection during the pNFS server development.
MFC after: 2 weeks
busdma know, so that on architectures where dma isn't always coherent, we
know we don't have to write-back/invalidates cachelines on DMA operations.
Reviewed by: andrew, mav
validation of SEG.ACK as the first step. If the ACK is not acceptable,
a RST segment should be sent and the segment should be dropped.
Up to now, the segment was partially processed.
This patch moves the check for the SEG.ACK validation up to the front
as required.
Reviewed by: hiren, gnn
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D10424
PCB SP cache acquires extra reference, when SP is stored in the cache.
Release this reference when PCB is destroyed in ipsec_delete_pcbpolicy().
In ipsec_copy_pcbpolicy() release reference to SP in case if sp_in or
sp_out are not NULL.
Reported by: Slawa Olhovchenkov <slw at zxy spb ru>
MFC after: 1 week
before we increase irq again, or we'd end up choosing an irq, and then
really using the next one, even if it's not available.
Also in the inner loop, correct the end check so that we check every irq,
even the last one.
This makes the msk(4) adapter able to use MSI on Softiron Overdrive 1000.
This check has been redundant since it was introduced in r162554.
Reviewed by: emaste, glebius
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10322
7252 7628 compressed zfs send / receive
illumos/illumos-gate@5602294fda5602294fdahttps://www.illumos.org/issues/7252
This feature includes code to allow a system with compressed ARC enabled to
send data in its compressed form straight out of the ARC, and receive data in
its compressed form directly into the ARC.
https://www.illumos.org/issues/7628
We should have longer, more readable versions of the ZFS send / recv options.
7628 create long versions of ZFS send / receive options
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: David Quigley <dpquigl@davequigley.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Dan Kimmel <dan.kimmel@delphix.com>
wait for the next enqueue from the driver.
Reviewed by: gnn@, hselasky@, gallatin@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D10432
patm(4) devices.
Maintaining an address family and framework has real costs when we make
infrastructure improvements. In the case of NATM we support no devices
manufactured in the last 20 years and some will not even work in modern
motherboards (some newer devices that patm(4) could be updated to
support apparently exist, but we do not currently have support).
With this change, support remains for some netgraph modules that don't
require NATM support code. It is unclear if all these should remain,
though ng_atmllc certainly stands alone.
Note well: FreeBSD 11 supports NATM and will continue to do so until at
least September 30, 2021. Improvements to the code in FreeBSD 11 are
certainly welcome.
Reviewed by: philip
Approved by: harti
The NFSv4 RFCs give a server the option of allowing the use of an open
stateid for write access to be used for a Read operation.
This patch enables this by default and adds a sysctl to disable it,
for anyone who does not want this capability.
Allowing this is particularily useful for a pNFS Data Server (DS), since
they are not permitted to allow the use of special stateids.
Discovered during recent testing of the pNFS server under development.
MFC after: 2 weeks
BHND_EROM_DUMP() method.
Dump the EROM tables to the coneole on mips/broadcom devices if bootverbose
is enabled; this functionality is primarily useful when debugging SoC EROM
parsing and device matching issues during early boot.
Reviewed by: mizhka
Approved by: adrian (mentor)
Sponsored by: Plausible Labs
Differential Revision: https://reviews.freebsd.org/D10122
kernel calls this directly so the event handler is not called, meaning
the computer fails to reboot.
Tested by: cognet
MFC after: 1 week
Sponsored by: DARPA, AFRL
Hyper-V hot channel effect:
Operation latency on hot channel is only _half_ of the operation
latency on cold channels.
This commit takes the advantage of the above Hyper-V host channel
effect, and can reduce more than 75% latency and more than 50%
latency stdev, i.e. lower and more stable/predictable latency,
for various types of web server workloads.
MFC after: 3 days
Sponsored by: Microsoft
An NFSv4 server has the option of allowing a Read to be done using a Write
Open. If this is not allowed, the server will return NFSERR_OPENMODE.
This patch attempts the read with a write open and then disables this
if the server replies NFSERR_OPENMODE.
This change will avoid some uses of the special stateids. This will be
useful for pNFS/DS Reads, since they cannot use special stateids.
It will also be useful for any NFSv4 server that does not support reading
via the special stateids. It has been tested against both types of NFSv4 server.
MFC after: 2 weeks
The NFSv4.1/pNFS client does not use/need a backchannel for the Data Server (DS)
sessions, so the flag should only be set for MetaData Server (MDS) sessions.
This patch should have been a part of r317275.
MFC after: 2 weeks
colors.
Colors are still hard-coded as 15 (normally lightwhite) for the interior
and 0 (normally black) for the border, but these are now values used in
2 expressions instead of built in to the algorithm. The algorithm used
a fancy and/or method, but this gives no control over the colors except
and'ing all color planes off gives black and or'ing all color planes on
gives lightwhite. Just draw the border and interior in separate colors
using the same method as for characters, including its complications to
optimize for VGA adaptors. Optimization is not really needed here, but
for the VGA case it avoids being slower than the and/or method. The
optimization is worth about 30%.
The "return layout on close" case in the pNFS client was badly broken.
Fortunately, extant pNFS servers that I have tested against do not
do this. This patch fixes it. It also changes the way the layout stateid.seqid
is set for LayoutReturn. I think this change is correct w.r.t. the RFC,
but I am not 100% sure.
This was found during recent testing of the pNFS server under development.
MFC after: 2 weeks
The NFSv4.1/pNFS client wasn't doing a newnfs_disconnect() call for the
connection to the Data Server (DS) under some circumstances. The main
effect of this was a leak of malloc'd structures in the krpc. This patch
adds the newnfs_disconnect() calls to fix this.
Detected during recent testing against the pNFS server under development.
MFC after: 2 weeks
Rename the mtu variable in ip6_fragment(), because mtu is misleading. The
variable actually holds the fragment length.
No functional change.
Suggested by: ae
For now there isn't any per-VAP WME state. The eventual aim is to migrate
the driver direct use of WME parameters over to use these methods as
appropriate (global for most devices, per-VAP for firmware NICs that support
it) in preparation for actual per-VAP WME (and other thing) state change
support.
In my eagerness to eliminate a branch which is taken once per 2^38
bytes of keystream, I forgot that the state words are in host order.
Thus, the counter increment code worked fine on little-endian
machines, but not on big-endian ones. Switch to a simpler (branchful)
solution.
The NFSv4 Setattr operation always has reply data even when it fails,
so don't set the ND_NOMOREDATA for it. This would only affect unusual
cases where Setattr fails and the RPC code wants to parse the rest of
the compound. Detected during recent development related to the pNFS server.
MFC after: 2 weeks
An NFSv4.1 client connection to a Data Server (DS) should not have a
backchannel. This patch fixes the NFSv4.1/pNFS client to not do a backchannel
for this case.
Found during recent testing with the pNFS server under development.
MFC after: 2 weeks
Implement FUSE open flag FOPEN_KEEP_CACHE. Without this flag, cached file
contents should be invalidated on open. Apparently, fusefs-encfs relies
upon this behavior.
PR: 218636
Submitted by: Ben RUBSON <ben.rubson at gmail.com>
The nfscl_mtofh() function didn't check for failed operations and, as such,
would have returned EBADRPC for these cases, due to parsing failure.
This patch adds checks, so that it returns with ND_NOMOREDATA set.
This is needed for future use in the pNFS server and acts as a safety
belt in the meantime.
MFC after: 2 weeks
The default uid/gid for NFSv4 are set by the nfsuserd(8) daemon.
However, they were 0 until the nfsuserd(8) was run. Since it is
possible to use NFSv4 without running the nfsuserd(8) daemon, set them
to nobody/nogroup initially.
Without this patch, the values would be set by the nfsuserd(8) daemon
and left changed even if the nfsuserd(8) daemon was killed. The default
values of 0 meant that setting a group to "wheel" would fail even when
done by root.
It also adds a definition of GID_NOGROUP to sys/conf.h.
Discussed on: freebsd-current@
MFC after: 2 weeks
7386 zfs get does not work properly with bookmarks
illumos/illumos-gate@edb901aab9edb901aab9https://www.illumos.org/issues/7386
The zfs get command does not work with the bookmark parameter while it works
properly with both filesystem and snapshot:
# zfs get -t all -r creation rpool/test
NAME PROPERTY VALUE SOURCE
rpool/test creation Fri Sep 16 15:00 2016 -
rpool/test@snap creation Fri Sep 16 15:00 2016 -
rpool/test#bkmark creation Fri Sep 16 15:00 2016 -
# zfs get -t all -r creation rpool/test@snap
NAME PROPERTY VALUE SOURCE
rpool/test@snap creation Fri Sep 16 15:00 2016 -
# zfs get -t all -r creation rpool/test#bkmark
cannot open 'rpool/test#bkmark': invalid dataset name
#
The zfs get command should be modified to work properly with bookmarks too.
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Marcel Telka <marcel@telka.sk>
remove the former.
All other EGA/VGA methods were already shared, with VGA-only features
mostly not used and no decisions in inner loops to optimize fof VGA,
but this method was split up because it is the only important one and
using VGA methods if possible is about twice as fast. The speed is
mostly not from splitting to reduce branches but from doing half as
many bus accesses, so make this easier to maintain by not splitting.
There is now 1 extra branch in an inner loop where it costs less than
1% of the bus access overhead on Haswell even if the compiler schedules
it poorly.
Before this change it was impossible to set number of PKCS#5v2 iterations,
required to set passphrase, if it has two keys and never had any passphrase.
Due to present metadata format limitations there are still cases when number
of iterations can not be changed, but now it works in cases when it can.
PR: 218512
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D10338
vga planar method (for testing that was supposed to be local that the
former still works). The ega method works on vga but is about twice
as slow. The vga method doesn't work on ega.
Optimize the main vga planar method a little. For changing the
background color (which was otherwise optimized better than most
things), don't switch the write mode from 3 to 0 just to select
the pixel mask of 0xff obscurely by writing 0. Just write 0xff
directly.
-(SYNCOOKIE_LIFETIME + 1) instead of INT64_MIN, since it is
good enough and works when time_t is int32 or int64.
This fixes the issue reported by cy@ on i386.
Reported by: cy
MFC after: 1 week
Sponsored by: Netflix, Inc.
The default uid/gid for NFSv4 are set by the nfsuserd(8) daemon.
However, they were 0 until the nfsuserd(8) was run. Since it is
possible to use NFSv4 without running the nfsuserd(8) daemon, set them
to nobody/nogroup initially.
Without this patch, the values would be set by the nfsuserd(8) daemon
and left changed even if the nfsuserd(8) daemon was killed. Also, the default
values of 0 meant that setting a group to "wheel" would fail even when
done by root and this patch fixes this issue.
MFC after: 2 weeks
7490 real checksum errors are silenced when zinject is on
illumos/illumos-gate@6cedfc397d6cedfc397dhttps://www.illumos.org/issues/7490
When zinject is on, error codes from zfs_checksum_error() can be overwritten
due to an incorrect and overly-complex if condition.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
7448 ZFS doesn't notice when disk vdevs have no write cache
illumos/illumos-gate@295438ba32295438ba32https://www.illumos.org/issues/7448
I built a SmartOS image with all the NVMe commits including 7372
(support NVMe volatile write cache) and repeated my dd testing:
> #!/bin/bash
> for i in `seq 1 1000`; do
> dd if=/dev/zero of=file00 bs=1M count=102400 oflag=sync &
> dd if=/dev/zero of=file01 bs=1M count=102400 oflag=sync &
> wait
> rm file00 file01
> done
>
Previously each dd command took ~145 seconds to finish, now it takes
~400 seconds.
Eventually I figured out it is 7372 that causes unnecessary
nvme_bd_sync() executions which wasted CPU cycles.
If a NVMe device doesn't support a write cache, the nvme_bd_sync function will
return ENOTSUP to indicate this to upper layers.
It seems this returned value is ignored by ZFS, and as such this bug is not
really specific to NVMe. In vdev_disk_io_start() ZFS sends the flush to the
disk driver (blkdev) with a callback to vdev_disk_ioctl_done(). As nvme filled
in the bd_sync_cache function pointer, blkdev will not return ENOTSUP, as the
nvme driver in general does support cache flush. Instead it will issue an
asynchronous flush to nvme and immediately return 0, and hence ZFS will not set
vdev_nowritecache here. The nvme driver will at some point process the cache
flush command, and if there is no write cache on the device it will return
ENOTSUP, which will be delivered to the vdev_disk_ioctl_done() callback. This
function will not check the error code and not set nowritecache.
The right place to check the error code from the cache flush is in
zio_vdev_io_assess(). This would catch both cases, synchronous and asynchronous
cache flushes. This would also be independent of the implementation detail that
some drivers can return ENOTSUP immediately.
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Obtained from: Illumos
The FreeBSD NFSv4 server did not set the attribute bit for TimeAccess in
the reply to an Open with exclusive_create, as required by the RFCs.
(This is required since the FreeBSD NFS server stores the create_verifier
in the va_atime attribute.)
As such, the Linux NFSv4 client did not set the TimeAccess (atime) in
the Setattr done in an RPC after the one with the Open/exclusive_create.
This patch fixes the server to set the TimeAccess bit in the reply.
I believe that storing the create_verifier in an extended attribute for
file systems that support extended attributes might be a good idea,
but I will wait for a discussion of this on the freebsd-fs@ email list
before considering committing a patch to do this.
Reported by: jim@ks.uiuc.edu
Suggested by: dfr
MFC after: 2 weeks
7430 Backfill metadnode more intelligently
illumos/illumos-gate@af346df588af346df588https://www.illumos.org/issues/7430
Description and patch from brought over from the following ZoL commit: https://
github.com/zfsonlinux/zfs/commit/68cbd56e182ab949f58d004778d463aeb3f595c6
Only attempt to backfill lower metadnode object numbers if at least
4096 objects have been freed since the last rescan, and at most once
per transaction group. This avoids a pathology in dmu_object_alloc()
that caused O(N^2) behavior for create-heavy workloads and
substantially improves object creation rates. As summarized by
@mahrens in #4636:
"Normally, the object allocator simply checks to see if the next
object is available. The slow calls happened when dmu_object_alloc()
checks to see if it can backfill lower object numbers. This happens
every time we move on to a new L1 indirect block (i.e. every 32 *
128 = 4096 objects). When re-checking lower object numbers, we use
the on-disk fill count (blkptr_t:blk_fill) to quickly skip over
indirect blocks that don?t have enough free dnodes (defined as an L2
with at least 393,216 of 524,288 dnodes free). Therefore, we may
find that a block of dnodes has a low (or zero) fill count, and yet
we can?t allocate any of its dnodes, because they've been allocated
in memory but not yet written to disk. In this case we have to hold
each of the dnodes and then notice that it has been allocated in
memory.
The end result is that allocating N objects in the same TXG can
require CPU usage proportional to N^2."
Add a tunable dmu_rescan_dnode_threshold to define the number of
objects that must be freed before a rescan is performed. Don't bother
to export this as a module option because testing doesn't show a
compelling reason to change it. The vast majority of the performance
gain comes from limit the rescan to at most once per TXG.
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Ned Bass <bass6@llnl.gov>
Obtained from: Illumos
overflows, syncookies are used.
This patch restricts the usage of syncookies in this case: accept
syncookies only if there was an overflow of the syncache recently.
This mitigates a problem reported in PR217637, where is syncookie was
accepted without any recent drops.
Thanks to glebius@ for suggesting an improvement.
PR: 217637
Reviewed by: gnn, glebius
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D10272
a pointer to the main ega drawing method which is misoptimized be in
a different function than the main vga planar mode drawing method.
Vga initialization handles everything with no extra code except for
selecting the different function.
corresponding to the gaps between characters. This fixes distortion
of the cursor due to expanding it across the gaps.
Again for character width 9, when the cursor characters are not in the
graphics range (0xb0-0xdf), the gaps were always there (filled in the
background color for the previous char). They still look strange, but
don't cause distortion. When the cursor characters are in the graphics
range, the gaps are filled by repeating the previous line. This gives
distortion with cilia. Removing vertical lines reduces the distortion
to vertical cilia.
Move the default for the cursor characters out of the graphics range.
With character width 9, this gives gaps instead of distortion and
other problems. With character width 8, it just fixes a smaller set
of other problems. Some distortion and other problems can be recovered
using vidcontrol -M. Presumably the default was to fill the gaps
intentionally, but it is much better to leave gaps. The gaps can even
be considered as a feature for text processing -- they give sub-pointers
to character boundaries. The other problems are: (1) with character
width 9, characters near the cursor are moved into the graphics range
and thus distorted if any of their 8th bits is set; (2) conflicts with
national characters in the graphics range.
The default range for the graphics cursor characters is now 8-11. This
doesn't conflict with anything, since the glyphs for the characters in
this range are unreachable.
Use the 10x16 mouse cursor in text mode too (if the font size is >= 14).
When the character width is 9, removal of 1 or 2 vertical lines makes
10x16 cursor no wider than the 9x13 one usually was. We could even
handle cursors 1 pixel wider in 2 character cells and gaps without
more clipping than given by the gaps (the worst case is 1 pixel in the
left cell, 1 removed in the middle gap, 8 in the right cell and 1
removed in the right gap. The pixel in the right gap is removed so
it doesn't matter if it is in the font).
When the character width is 8, we now clip the 10-wide cursor by 1
pixel in the worst case. This clipping is usually invisible since it
is of the border and and the border usually merges with the background
so is invisible. There should be an option to use reverse video to
highlight the border and its tip instead of the interior (graphics
mode can do better using separate colors). This needs the 9x13 cursor
again.
Ideas from: ache (especially about the bad default character range)
Note that KVA mapping of the framebuffer already uses write-combining
mode, so the change, besides improving speed of user mode writes, also
satisfies requirement of the IA32 architecture of using consistent
caching modes for multiple mappings of the same page.
Reported and tested by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
not consider a "disabled" cpu as a CPU we have to ignore, and we should use
them if they provide a "enable-method".
While I'm there, support "ok" as well as "okay", while ePAPR only accepts
"okay", linux accepts "ok" too so we can expect it to be used.
Reviewed by: andrew (partially)
9 wide.
I only need this to improve the mouse cursor, but it has always been
needed to select and/or adjust fonts.
This is complicated because there are no standard parameter tables
giving this bit of information directly, and the device register bit
giving the information can't be trusted even if it is read from the
hardware. Use a heuristic to guess if the device register can be
trusted. (The device register is normally read from the BIOS mode
table, but on my system where the device register is wrong, the mode
table doesn't match the hardware and is not used; the device registers
are used in this case.)
When forwarding pf tracks the size of the largest fragment in a fragmented
packet, and refragments based on this size.
It failed to ensure that this size was a multiple of 8 (as is required for all
but the last fragment), so it could end up generating incorrect fragments.
For example, if we received an 8 byte and 12 byte fragment pf would emit a first
fragment with 12 bytes of payload and the final fragment would claim to be at
offset 8 (not 12).
We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so
other users won't make the same mistake.
Reported by: Antonios Atlasis <aatlasis at secfu net>
MFC after: 3 days
mode.
Use the general DRAWPIXEL() macro with its bigger case statement
(twice) instead of our big case statement (once). DRAWPIXEL() is more
complicated since it is not missing support for depth 24 or
complications for colors in depth 16 (we currently hard-code black and
white so the complications for colors are not needed). DRAWPIXEL()
also does the bpp calculation in the inner loop. Compilers optimize
DRAWPIXEL() well enough, and the main text drawing method always
depended on this. In direct mode, mouse cursor drawing is now similar
to normal text drawing except it draws in 2 hard-coded colors instead
of 1 variable color.
This also fixes a nested hard-coding of colors. DRAWPIXEL() uses the
palette in all cases, but the direct code didn't use the palette for
its hard-coded black. This only had an effect in depth 8, since
changing the palette is not supported in other depths.
direct mode renderer. I thought that reads were not much slower than
writes, so that the method only tripled the time for the whole function,
but I recently measured that video memory reads can be up to 53 times
slower than writes in tighter loops than here. Loop overheap here
reduces the multiplier to only 16-20 on Haswell.
Start cleaning up and fixing larger bugs in this function. Only replace
the 22-line removal loop by a 3-line one for now, since adjusting the
old loop would have required many palette calculations which are better
done in the DRAW_PIXEL() macro. This also fixes missing support for
depth 24, but only for removal.
Removal is currently sloppy at the right bottom corner. It sometimes
leaks border color into the text window. This is soon cleaned up by the
caller. The planar renderer has complications to clip at the corner.
using a driver-supplied sbuf for printing device discovery
announcements. This helps ensure that messages to the console
will be properly serialized (through sbuf_putbuf) and not be
truncated and interleaved with other messages. The
infrastructure mirrors the existing xpt_announce_periph()
entry point and is opt-in for now. No content or formatting
changes are visible to the operator other than the new coherency.
While here, eliminate the stack usage of the temporary
announcement buffer in some of the drivers. It's moved to the
softc for now, but future work will eliminate it entirely by
making the code flow more linear. Future work will also address
locking so that the sbufs can be dynamically sized.
The scsi_da, scs_cd, scsi_ses, and ata_da drivers are converted
at this point, other drivers can be converted at a later date.
A tunable+sysctl, kern.cam.announce_nosbuf, exists for testing
purposes but will be removed later.
TODO:
Eliminate all of the code duplication and temporary buffers. The
old printf-based methods will be retired, and xpt_announce_periph()
will just be a wrapper that uses a dynamically sized sbuf. This
requires that the register and deregister paths be made malloc-safe,
which they aren't currently.
Sponsored by: Netflix
of our tweaked modes based on it. In practice, this means limiting the
tweaked modes to at most 80x50 based on 80x25, so there are no 90-column,
80x30 or 80x60 modes.
This happens when the the initial mode is is not in the parameter
table. We always detected this case, but assumed that the (necessarily
nonstandard) parameters of the initial mode could be tweaked just as
blindly as the probably-standard parameters of initial modes in the
table.
On 1 laptop system with near-VGA where the initial mode is nonstandard,
this is because the hardware apparently doesn't support 9-bit mode,
but otherwise has standard timing. The initial mode has 8-bit mode
CRTC horizontal parameters similar to those in syscons' 90-column modes
and in EGA modes. Tweaking these values for the 90-column modes has
little effect except to print the extra 10 columns off the screen.
Tweaking from 80x25 to 80x30 requires changing from 400 scan lines to
480. This can probably be made to work, but syscons blindly applies
values based on standard timing. This gives blank output. Tweaking
from 80x25 to 80x50 doesn't change the CRTC timing and works.
kit, CK, in the LinuxKPI.
When threads are pinned to a CPU core or when there is only one CPU,
it can happen that a higher priority thread can call the CK
synchronize function while a lower priority thread holds the read
lock. Because the CK's synchronize is a simple wait loop this can lead
to a deadlock situation. To solve this problem use the recently
introduced CK's wait callback function.
When detecting a CK blocking condition figure out the lowest priority
among the blockers and update the calling thread's priority and
yield. If another CPU core is holding the read lock, pin the thread to
the blocked CPU core and update the priority. The calling threads
priority and CPU bindings are restored before return.
If a thread holding a CK read lock is detected to be sleeping, pause()
will be used instead of yield().
MFC after: 1 week
Sponsored by: Mellanox Technologies
CPUs when allocating a LinuxKPI workqueue. This also ensures that the
created taskqueue always have a non-zero number of worker threads.
MFC after: 1 week
Sponsored by: Mellanox Technologies
According to Warner, multiple TRIM BIOs are collapsed into a single CCB with
NULL bp. It is invalid to biotrack() NULL, and results in a fault. So,
don't do that.
Reported by: asomers@
Sponsored by: Dell EMC Isilon
As the uboot disk interface is using common/disk.c API, we also
should use disk_ioctl() call, this will give us chance to read partition
sizes and have feature parity with UEFI and BIOS implementations.
This does also fix arm boot issue on some systems, reported/tested by Ian,
thanks.
Reported by: ian
Reviewed by: ian
Differential Revision: https://reviews.freebsd.org/D10421
The work to make it possible to avoid bcache via using F_NORA modifier did
miss the fact that not all loader platforms are using the bcache, and so
it is possible the modifier is not cleared, as bcache strategy function is
not used.
For fix, we make sure the checks are dont with masked flag.
This patch does fix boot for platforms which do not use bcache.
Reported by: emaste
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D10422
The smallest device we can have in the pool is 64MB, since we are trying to
walk all four labels to find the most up to date uberblock, this limit will
also give us good method to check if we even should attempt to probe.
Enforcing the check also will make sure we are not getting wrapped while
calculating the label offset.
Also, after label check, we should verify if we actually got any UB or not.
PR: 218473
Reported by: Masachika ISHIZUKA
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D10381
Add early init handler, which comprises various internal
bus optimisations for Armada 38x SoC's. Magic values used
due to undocumented registers.
Submitted by: Marcin Wojtas <mw@semihalf.com>,
Arnaud Ysmal <arnaud.ysmal@stormshield.eu>
Obtained from: Semihalf, Stormshield
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10219
Part of PL310 erratum 727915 in pl310_wbinv_range() was
executed uncoditionally for all possible controllers'
revisions. This patch adds appropriate condition, since
extra operations are required only for revisions between
r2p0 and r3p0.
Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: meloun-miracle-cz
Differential revision: https://reviews.freebsd.org/D10221
Introduce machine-dependent part of the arm/pl310 driver for
Armada 38x SoCs. Add prefetch and power savings configuration.
Submitted by: <arnaud.ysmal@stormshield.eu>
Obtained from: Stormshield
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10220
Memory space reserved for pmap_kernel_l2dtable_kva and
pmap_kernel_l2ptp_kva has not been taken into account in
original code. All the memory reserved from kernel space by
pmap_alloc_specials() function called in pmap_bootstrap()
should be mapped initially by initarm(). To create initial
mapping initarm() function reserves proper number of l2 page
tables. However the number of the l2 page tables does not take
into account memory for: pmap_kernel_l2ptp_kva,
pmap_kernel_l2dtable_kva, crashdumpmap, etc.
Submitted by: Grzegorz Bernacki <gjb@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: meloun-miracle-cz
Differential revision: https://reviews.freebsd.org/D10217
VM_KMEM_SIZE_MAX allows to limit kmem arena size. In our case this was
necessary, as decreasing size of kmem_arena leaves more space for
kernel_arena.
kernel_arena is pool used for contigmalloc (in effect, DMA) allocations,
which failed on Armada38x. This resulted in 'no memory errors'
(e.g. USB_ERR_NOMEM errors) and failure of whole system. The need for
greater size of kernel_arena probably comes from more peripherals making
use of busdma.
Value used as upper limit is half of the default value
(0x1399a000).
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10216
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.
Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.
This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html
Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156