These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.
MFC after: 3 weeks
9341-4i controller was to ensure that scatter/gather lists are ended with
an end-of-list marker. Both the mrsas and Linux megaraid_sas drivers use
this marker with Invader cards as well, so we do the same thing, though
it is apparently not strictly necessary.
Reviewed by: ambrisko
Tested by: ambrisko (Invader card)
MFC after: 3 weeks
Sponsored by: Sandvine Inc.
allow mrsas(4) from LSI to attach to newer LSI cards that are support by
mrsas(4). If mrsas(4) is not loaded into the system at boot then mfi(4)
will always attach. If a modified mrsas(4) is loaded in the system. That
modification is return "-30" in it's probe since that is between
BUS_PROBE_DEFAULT and BUS_PROBE_LOW_PRIORITY.
This option is controller by a new probe flag "MFI_FLAGS_MRSAS" in mfi_ident
that denotes cards that should work with mrsas(4). New entries that should
have this option.
This is the first step to get mrsas(4) checked into FreeBSD and to avoid
collision with people that use mrsas(4) from LSI. Since mfi(4) takes
priority, then mrsas(4) users need to rebuild GENERIC. Using the
.disabled="1" method doesn't work since that blocks attaching and the
probe gave it to mfi(4).
Discussed with: LSI (Kashyap Desai)
make CAM to not try negotiate unsupported settings and suppress warnings.
While there, enable command queuing on pass-through devices, announced
in hba_inquiry, but disabled. Even though queue size is very small, It
seems working well enough.
Reviewed by: scottl
MFC after: 2 weeks
structure in the driver.
Having these in 10.0 means that mfiutil can be modified to take adavantage
of new updates without a kernel recompile.
Approved by: re (gjb)
MFC after: 2 weeks
in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.
The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
xpt_rescan() expects the SIM lock to be held, and we trip a mtx_assert if
the driver initiates multiple rescans in quick succession.
Reported by: sbruno
Tested by: sbruno
MFC after: 1 week
real JBOD mode (SYS PD) would fail fairly reliably during I/O.
Steal the mfi_disk.c check for this condition (indirectly) when establishing
d_maxsize.
Reviewed by: ambrisko@
MFC after: 4 weeks
Sponsored by: Yahoo! Inc.
command register. The lazy BAR allocation code in FreeBSD sometimes
disables this bit when it detects a range conflict, and will re-enable
it on demand when a driver allocates the BAR. Thus, the bit is no longer
a reliable indication of capability, and should not be checked. This
results in the elimination of a lot of code from drivers, and also gives
the opportunity to simplify a lot of drivers to use a helper API to set
the busmaster enable bit.
This changes fixes some recent reports of disk controllers and their
associated drives/enclosures disappearing during boot.
Submitted by: jhb
Reviewed by: jfv, marius, achadd, achim
MFC after: 1 day
While this prevents commands getting stuck forever there is no way to guarantee
that data from the command hasn't been committed to the device.
In addition older mfi firmware has a bug that would cause the controller to
frequently stall IO for over our timeout value, which when combined with
a forced timeout often resulted in panics in UFS; which would otherwise be
avoided when the command eventually completed if left alone.
For reference this timeout issue is resolved in Dell FW package 21.2.1-0000.
Fixed FW package version for none Dell controller will likely vary.
MFC after: 2 days
Stop abusing xpt_periph in random plases that really have no periph related
to CCB, for example, bus scanning. NULL value is fine in such cases and it
is correctly logged in debug messages as "noperiph". If at some point we
need some real XPT periphs (alike to pmpX now), quite likely they will be
per-bus, and not a single global instance as xpt_periph now.
relearning. Specifically, add subcommands to mfiutil(8) which allow the
user to set the BBU and autolearn modes when the firmware supports it,
and add a subcommand which kicks off a battery relearn.
Reviewed by: sbruno, rstone
Tested by: sbruno
Approved by: rstone (co-mentor)
MFC after: 2 weeks
Sponsored by: Sandvine Incorporated
without first removing the command from the relavent queue.
This was causing panics in the queue functions which check to ensure a command
is not on another queue.
Fixed some cases where the error from mfi_mapcmd was lost and where the command
was never released / dequeued in error cases.
Ensure that all failures to mfi_mapcmd are logged.
Fixed possible null pointer exception in mfi_aen_setup if mfi_get_log_state
failed.
Fixed mfi_parse_entries & mfi_aen_setup not returning possible errors.
Corrected MFI_DUMP_CMDS calls with invalid vars SC vs sc.
Commands which have timed out now set cm_error to ETIMEDOUT and call
mfi_complete which prevents them getting stuck in the busy queue forever.
Fixed possible use of NULL pointer in mfi_tbolt_get_cmd.
Changed output formats to be more easily recognisable when debugging.
Optimised mfi_cmd_pool_tbolt cleanup.
Made information about driver limiting commands always display as for modern
cards this can be severe.
Fixed mfi_tbolt_alloc_cmd out of memory case which previously didnt return an
error.
Added malloc checks for request_desc_pool including free when subsiquent errors
are detected.
Fixed overflow error in SIMD reply descriptor check.
Fixed tbolt_cmd leak in mfi_build_and_issue_cmd if there's an error during IO
build.
Elimintated double checks on sc->mfi_aen_cm & sc->mfi_map_sync_cm in
mfi_shutdown.
Move local hdr calculation after error check in mfi_aen_complete.
Fixed wakeup on NULL in mfi_aen_complete.
Fixed mfi_aen_cm cleanup in mfi_process_fw_state_chg_isr not checking if it was
NULL.
Changed mfi_alloc_commands to error if bus_dmamap_create fails. Previously we
would try to continue with the number of allocated commands but lots of places
in the driver assume sc->mfi_max_fw_cmds is whats available so its unsafe to do
this without lots of changes.
Removed mfi_total_cmds as its no longer used due the above change.
Corrected mfi_tbolt_alloc_cmd to return ENOMEM where appropriate.
Fixed timeouts actually firing at double what they should.
Setting hw.mfi.max_cmds=-1 now configures to use the controller max.
A few style (9) fixes e.g. braced single line conditions and double blank lines
Cleaned up queuing macros
Removed invalid queuing tests for multiple queues
Trap and deal with errors when doing sends in mfi_data_cb
Refactored frame sending into one method with error checking of the return
code so we can ensure commands aren't left on the queue after error. This
ensures that mfi_mapcmd & mfi_data_cb leave the queue in a valid state.
Refactored how commands are cleaned up, mfi_release_command now ensures
that all queues and command state is maintained in a consistent state.
Prevent NULL pointer use in mfi_tbolt_complete_cmd
Fixed use of NULL sc->mfi_map_sync_cm in wakeup
Added defines to help with output of mfi_cmd and header flags.
Fixed mfi_tbolt_init_MFI_queue invalidating cm_index of the acquired mfi_cmd.
Reset now reinitialises sync map as well as AEN.
Fixed possible use of NULL pointer in mfi_build_and_issue_cmd
Fixed mfi_tbolt_init_MFI_queue call to mfi_process_fw_state_chg_isr causing
panic on failure.
Ensure that tbolt cards always initialise next_host_reply_index and
free_host_reply_index (based off mfi_max_fw_cmds) on both startup and
reset as per the linux driver.
Fixed mfi_tbolt_complete_cmd not acknowledging unknown commands so
it didn't clear the controller.
Prevent locks from being dropped and re-acquired in the following functions
which was allowing multiple threads to enter critical methods such as
mfi_tbolt_complete_cmd & mfi_process_fw_state_chg_isr:-
* mfi_tbolt_init_MFI_queue
* mfi_aen_complete / mfi_aen_register
* mfi_tbolt_sync_map_info
* mfi_get_log_state
* mfi_parse_entries
The locking for these functions was promoting to higher level methods. This
also fixed MFI_LINUX_SET_AEN_2 which was already acquiring the lock, so would
have paniced for recursive lock.
This also required changing malloc of ld_sync in mfi_tbolt_sync_map_info to
M_NOWAIT which can hence now fail but this was already expected as its return
was being tested.
Removed the assignment of cm_index in mfi_tbolt_init_MFI_queue which breaks
the world if the cmd returned by mfi_dequeue_free isn't the first cmd.
Fixed locking in mfi_data_cb, this is an async callback from bus_dmamap_load
which could hence be called after the caller has dropped the lock. If we
don't have the lock we aquire it and ensure we unlock before returning.
Fixed locking mfi_comms_init when mfi_dequeue_free fails.
Fixed mfi_build_and_issue_cmd not returning tbolt cmds aquired to the pool
on error.
Fixed mfi_abort not dropping the io lock when mfi_dequeue_free fails.
Added hw.mfi.polled_cmd_timeout sysctl that enables tuning of polled
timeouts. This shouldn't be reduced below 50 seconds as its used for
firmware patching which can take quite some time.
Added hw.mfi.fw_reset_test sysctl which is avaliable when compiled with
MFI_DEBUG and allows the testing of controller reset that was provoking a
large number of the issues encountered here.
Reviewed by: Doug Ambrisko
Approved by: pjd (mentor)
MFC after: 1 month
Removes a mtx_unlock call for mfi_io_lock which is never aquired
While I'm here fix a braceing style issue.
Reviewed by: Doug Ambrisko
Approved by: pjd (mentor)
MFC after: 1 month
every architecture's busdma_machdep.c. It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code. The MD busdma is then given a chance to do any final processing
in the complete() callback.
The cam changes unify the bus_dmamap_load* handling in cam drivers.
The arm and mips implementations are updated to track virtual
addresses for sync(). Previously this was done in a type specific
way. Now it is done in a generic way by recording the list of
virtuals in the map.
Submitted by: jeff (sponsored by EMC/Isilon)
Reviewed by: kan (previous version), scottl,
mjacob (isp(4), no objections for target mode changes)
Discussed with: ian (arm changes)
Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
function use that for JBOD and Thunderbolt disk write command. Now
we only have one implementation in mfi.
- Fix dumping on Thunderbolt cards. Polled IO commands do not seem to
be normally acknowledged by changing cmd_status to MFI_STAT_OK.
In order to get acknowledgement of the IO is complete, the Thunderbolt
command queue needs to be run through. I added a flag MFI_CMD_SCSI
to indicate this command is being polled and to complete the
Thunderbolt wrapper and indicate the result. This flag needs to be
set in the JBOD case in case if that us using Thunderbolt card.
When in the polling loop check for completed commands.
- Remove mfi_tbolt_is_ldio and just do the check when needed.
- Fix an issue when attaching of disk device happens when a device is
already scheduled to be attached but hasn't attached.
- add a tunable to allow raw disk attachment to CAM via:
hw.mfi.allow_cam_disk_passthrough=1
- fixup aborting of commands (AEN and LD state change). Use a generic
abort function and only wait the command being aborted not both.
Thunderbolt cards don't seem to abort commands so the abort times
out.
command properly. Without this change, mfi(4) always sends 10 byte READ
and WRITE commands, which will cause data corruption when device is
larger than 2^32 sectors.
PR: kern/173291
Submitted by: Steven Hartland <steven.hartland multiplay.co.uk>
Reviewed by: mav
MFC after: 2 weeks
the upper levels notice. Otherwise we see commands silently failing leading
to data corruption. This mirrors dadone()
Submitted by: Andrew Boyer aboyer@averesystems.com
Reviewed by: scottl@freebsd.org
MFC after: 2 weeks
The new driver changed the size of the mfi_dcmd_frame structure in such a
way that a MFI_IOC_PASSTHRU ioctl from an old amd64 binary is treated as an
MFI_IOC_PASSTHRU32 ioctl in the new driver. As a result, the user pointer
is treated as the buffer length. mfi_user_command() doesn't have a bounds
check on the buffer length, so it passes a really big value to malloc()
which panics when it tries to exhaust the kmem_map. Fix this two ways:
- Only honor MFI_IOC_PASSTHRU32 if the binary has the SV_ILP32 flag set,
otherwise treat it as an unknown ioctl.
- Add a bounds check on the buffer length passed by the user. For now
it fails any user attempts to use a buffer larger than 1MB.
While here, fix a few other nits:
- Remove an unnecessary check for a NULL return from malloc(M_WAITOK).
- Use the ENOTTY errno for invalid ioctl commands instead of ENOENT.
MFC after: 3 days
PAE to insta-panic on startup. Remove one unused variable that was
commented out.
Reviewed by: ambrisko@
Obtained from: jhb@ peter@ bz@ and countless others during BSDCAN
MFC after: 3 days
them to cleanup and goto out when acknowledging the LD's. Check
for failure on malloc. Remove a couple of extra lines and remove
the spurious return.
Prompted by: Petr Lampa
MFC after: 3 days
ThunderBolt cannot read sector >= 2^32 or 2^21
with supplied patch.
Second the bigger change, fix RAID operation on ThunderBolt base
card such as physically removing a disk from a RAID and replacing
it. The current situation is the RAID firmware effectively hangs
waiting for an acknowledgement from the driver. This is due to
the firmware support of the driver actually accessing the RAID
from under the firmware. This is an interesting feature that
the FreeBSD driver does not use. However, when the firmare
detects the driver has attached it then expects the driver will
synchronize LD's with the firmware. If the driver does not sync.
then the management part of the firmware will hang waiting for
it so a pulled driver will listed as still there.
The fix for this problem isn't extremely difficult. However,
figuring out why some of the code was the way it was and then
redoing it was involved. Not have a spec. made it harder to
try to figure out. The existing driver would send a
MFI_DCMD_LD_MAP_GET_INFO command in write mode to acknowledge
a LD state change. In read mode it gets the RAID map from the
firmware. The FreeBSD driver doesn't do that currently. It
could be added in the future with the appropriate structures.
To simplify things, get the current LD state and then build
the MFI_DCMD_LD_MAP_GET_INFO/write command so that it sends
an acknowledgement for each LD. The map would probably state
which LD's changed so then the driver could probably just
acknowledge the LD's that changed versus all. This doesn't seem
to be a problem. When a MFI_DCMD_LD_MAP_GET_INFO/write command
is sent to the firmware, it will complete later when a change
to the LD's happen. So it is very much like an AEN command
returning when something happened. When the
MFI_DCMD_LD_MAP_GET_INFO/write command completes, we refire the
sync'ing of the LD state. This needs to be done in as an event
so that MFI_DCMD_LD_GET_LIST can wait for that command to
complete before issuing the MFI_DCMD_LD_MAP_GET_INFO/write.
The prior code didn't use the call-back function and tried
to intercept the MFI_DCMD_LD_MAP_GET_INFO/write command when
processing an interrupt. This added a bunch of code complexity
to the interrupt handler. Using the call-back that is done
for other commands got rid of this need. So the interrupt
handler is greatly simplified. It seems that even commands
that shouldn't be acknowledged end up in the interrupt handler.
To deal with this, code was added to check to see if a command
is in the busy queue or not. This might have contributed to the
interrupt storm happening without MSI enabled on these cards.
Note that MFI_DCMD_LD_MAP_GET_INFO/read returns right away.
It would be interesting to see what other complexity could
be removed from the ThunderBolt driver that really isn't
needed in our mode of operation. Letting the RAID firmware
do all of the I/O to disks is a lot faster since it can
use its caches. It greatly simplifies what the driver has
to do and potential bugs if the driver and firmware are
not in sync.
Simplify the aen_abort/cm_map_abort and put it in the softc
versus in the command structure.
This should get merged to 9 before the driver is merged to
8.
PR: 167226
Submitted by: Petr Lampa
MFC after: 3 days
First cut of new HW support from LSI and merge into FreeBSD.
Supports Drake Skinny and ThunderBolt cards.
MFhead_mfi r227574
Style
MFhead_mfi r227579
Use bus_addr_t instead of uintXX_t.
MFhead_mfi r227580
MSI support
MFhead_mfi r227612
More bus_addr_t and remove "#ifdef __amd64__".
MFhead_mfi r227905
Improved timeout support from Scott.
MFhead_mfi r228108
Make file.
MFhead_mfi r228208
Fixed botched merge of Skinny support and enhanced handling
in call back routine.
MFhead_mfi r228279
Remove superfluous !TAILQ_EMPTY() checks before TAILQ_FOREACH().
MFhead_mfi r228310
Move mfi_decode_evt() to taskqueue.
MFhead_mfi r228320
Implement MFI_DEBUG for 64bit S/G lists.
MFhead_mfi r231988
Restore structure layout by reverting the array header to
use [0] instead of [1].
MFhead_mfi r232412
Put wildcard pattern later in the match table.
MFhead_mfi r232413
Use lower case for hexadecimal numbers to match surrounding
style.
MFhead_mfi r232414
Add more Thunderbolt variants.
MFhead_mfi r232888
Don't act on events prior to boot or when shutting down.
Add hw.mfi.detect_jbod_change to enable or disable acting
on JBOD type of disks being added on insert and removed on
removing. Switch hw.mfi.msi to 1 by default since it works
better on newer cards.
MFhead_mfi r233016
Release driver lock before taking Giant when deleting children.
Use TAILQ_FOREACH_SAFE when items can be deleted. Make code a
little simplier to follow. Fix a couple more style issues.
MFhead_mfi r233620
Update mfi_spare/mfi_array with the actual number of elements
for array_ref and pd. Change these max. #define names to avoid
name space collisions. This will require an update to mfiutil
It avoids mfiutil having to do a magic calculation.
Add a note and #define to state that a "SYSTEM" disk is really
what the firmware calls a "JBOD" drive.
Thanks to the many that helped, LSI for the initial code drop,
mav, delphij, jhb, sbruno that all helped with code and testing.
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.