freebsd-skq

Author	SHA1	Message	Date
Conrad Meyer	797f009d59	gmirror: Relocate DEVICE_FLAGS to adjacent lines gmirror's sc_flags is shared between some on-disk state and some runtime only state. There's no real reason for that and they could probably be split up. Until they are, locate all of the flags for the same field nearby each other in the source, for clarity. No functional change. Sponsored by: Dell EMC Isilon	2019-01-23 16:44:21 +00:00
Mark Johnston	438622af06	Use g_handleattr() to reply to GEOM::candelete queries. g_handleattr() fills out bp->bio_completed; otherwise, g_getattr() returns an error in response to the query. This caused BIO_DELETE support to not be propagated through stacked configurations, e.g., a gconcat of gmirror volumes would not handle BIO_DELETE even when the gmirrors do. g_io_getattr() was not affected by the problem. PR: 232676 Reported and tested by: noah.bergbauer@tum.de MFC after: 1 week	2019-01-02 15:52:16 +00:00
Alexander Motin	02a9923034	Switch from mutexes to atomics in GEOM_DEV I/O path. Mutexes in I/O path there were used twice per I/O to atomically access several variables to close and/or destroy the device on last request completion. I found the way to fit all required info into one integer, suitable for atomic operations. It opened race window on device close, but addition of timeout to the msleep() there should cover it. Profiling shows removal of significant spinning time on those mutexes and IOPS increase from ~600K to >800K to NVMe on 72-core systems. MFC after: 1 month Sponsored by: iXsystems, Inc.	2018-12-27 19:15:24 +00:00
Conrad Meyer	d2d82bfc90	gmirror: Remove a last-minute INVARIANTS breakage in r341840 I mistakenly added a lock assertion to this routine at the last minute without confirming it was held during g_mirror_create. It isn't (it isn't even initialized yet). Mea culpa. Access is exclusive in both callers, just not always by that particular lock. Reported by: lwhsu X-MFC-With: r341840, r341674	2018-12-12 18:13:56 +00:00
Conrad Meyer	23c25bd8b1	gmirror: Fix a bug introduced in r341674 r341674 inadvertently introduced a bug where newer mirror components being tasted would clear the high sc_flags that are not controlled by component metadata, such as G_MIRROR_DEVICE_FLAG_TASTING. This could plausibly expose a small window of time during STARTING where device destruction might race with mirror component addition, probably resulting in a crash. Reviewed by: markj X-MFC-With: r341674 Differential Revision: https://reviews.freebsd.org/D18521	2018-12-12 05:48:27 +00:00
Conrad Meyer	af7dcae0e2	gmirror: Evaluate mirror components against newest metadata copy Re-apply r341665 with format strings fixed. If we happen to taste a stale mirror component first, don't reject valid, newer components that have differing metadata from the stale component (during STARTING). Instead, update our view of the most recent metadata as we taste components. Like mediasize beforehand, remove some checks from g_mirror_check_metadata which would evict valid components due to metadata that can change over a mirror's lifetime. g_mirror_check_metadata is invoked long before we check genid/syncid and decide which component(s) are newest and whether or not we have quorum. Before checking if we can enter RUNNING (i.e., we have quorum) after a NEW component is added, first remove any known stale or inconsistent disks from the mirrorset, rather than removing them after deciding we have quorum. Check if we have quorum after removing these components. Additionally, add a knob, kern.geom.mirror.launch_mirror_before_timeout, to force gmirrors to wait out the full timeout (kern.geom.mirror.timeout) before transitioning from STARTING to RUNNING. This is a kludge to help ensure all eligible, boot-time available mirror components are tasted before RUNNING a gmirror. Add a basic test case for STARTING -> RUNNING startup behavior around stale genids. PR: 232671, 232835 Submitted by: Cindy Yang <cyang AT isilon.com> (previous version) Reviewed by: markj (kernel portions) Discussed with: asomers, Cindy Yang Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D18062	2018-12-07 02:44:04 +00:00
Conrad Meyer	c4e87bdfc1	Revert r341665 due to tinderbox breakage I didn't notice that some format strings were non-portable. Will fix and re-commit later.	2018-12-07 00:47:05 +00:00
Conrad Meyer	bc1ee0be2d	gmirror: Evaluate mirror components against newest metadata copy If we happen to taste a stale mirror component first, don't reject valid, newer components that have differing metadata from the stale component (during STARTING). Instead, update our view of the most recent metadata as we taste components. Like mediasize beforehand, remove some checks from g_mirror_check_metadata which would evict valid components due to metadata that can change over a mirror's lifetime. g_mirror_check_metadata is invoked long before we check genid/syncid and decide which component(s) are newest and whether or not we have quorum. Before checking if we can enter RUNNING (i.e., we have quorum) after a NEW component is added, first remove any known stale or inconsistent disks from the mirrorset, rather than removing them after deciding we have quorum. Check if we have quorum after removing these components. Additionally, add a knob, kern.geom.mirror.launch_mirror_before_timeout, to force gmirrors to wait out the full timeout (kern.geom.mirror.timeout) before transitioning from STARTING to RUNNING. This is a kludge to help ensure all eligible, boot-time available mirror components are tasted before RUNNING a gmirror. When we are instructed to forget mirror components, bump the generation id to avoid confusion with such stale components later. Add a basic test case for STARTING -> RUNNING startup behavior around stale genids. PR: 232671, 232835 Submitted by: Cindy Yang <cyang AT isilon.com> (previous version) Reviewed by: markj (kernel portions) Discussed with: asomers, Cindy Yang Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D18062	2018-12-06 23:55:39 +00:00
Kirk McKusick	fb14e73cb4	Normally when an attempt is made to mount a UFS/FFS filesystem whose superblock has a check-hash error, an error message noting the superblock check-hash failure is printed and the mount fails. The administrator then runs fsck to repair the filesystem and when successful, the filesystem can once again be mounted. This approach fails if the filesystem in question is a root filesystem from which you are trying to boot. Here, the loader fails when trying to access the filesystem to get the kernel to boot. So it is necessary to allow the loader to ignore the superblock check-hash error and make a best effort to read the kernel. The filesystem may be suffiently corrupted that the read attempt fails, but there is no harm in trying since the loader makes no attempt to write to the filesystem. Once the kernel is loaded and starts to run, it attempts to mount its root filesystem. Once again, failure means that it breaks to its prompt to ask where to get its root filesystem. Unless you have an alternate root filesystem, you are stuck. Since the root filesystem is initially mounted read-only, it is safe to make an attempt to mount the root filesystem with the failed superblock check-hash. Thus, when asked to mount a root filesystem with a failed superblock check-hash, the kernel prints a warning message that the root filesystem superblock check-hash needs repair, but notes that it is ignoring the error and proceeding. It does mark the filesystem as needing an fsck which prevents it from being enabled for writing until fsck has been run on it. The net effect is that the reboot fails to single user, but at least at that point the administrator has the tools at hand to fix the problem. Reported by: Rick Macklem (rmacklem@) Discussed with: Warner Losh (imp@) Sponsored by: Netflix	2018-12-06 00:09:39 +00:00
Maxim Sobolev	9dcafe16d4	Another attempt to fix issue with the DIOCGDELETE ioctl(2) not handling slightly out-of-bound requests properly (r340187). Perform range check here rather then rely on g_delete_data() to DTRT. The g_delete_data() would always return success for requests starting just the next byte after providers media boundary. MFC after: 4 weeks	2018-12-04 21:48:56 +00:00
Dag-Erling Smørgrav	cdd2df880d	Add a “skip_dsn” option to g_part's bootcode verb to prevent g_part_mbr from setting the volume serial number. This unbreaks older boot blocks that don't support serial numbers, and allows boot0cfg to set the serial number itself if requested by the user. Submitted by: lev@, yuripv@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D17386	2018-11-27 14:58:19 +00:00
Maxim Sobolev	de66da7374	Revert r340187, it breaks EOD (end-of-device) detection logic. Turns out, i/o into last_sector+N is handled differently for N==1 and N>1 cases to accomodate that, so some other approach would be needed to fix DIOCGDELETE ioctl(2).	2018-11-07 16:28:09 +00:00
Maxim Sobolev	8948179aba	Don't allow BIO_READ, BIO_WRITE or BIO_DELETE requests that are fully beyond the end of providers media. The only exception is made for the zero length transfers which are allowed to be just on the boundary. Previously, any requests starting on the boundary (i.e. next byte after the last one) have been allowed to go through. No response from: freebsd-geom@, phk MFC after: 1 month	2018-11-06 15:55:41 +00:00
Mark Johnston	25c9cca757	Have gconcat advertise delete support if one of its disks does. This follows the example set by other multi-disk GEOM classes. PR: 232676 Tested by: noah.bergbauer@tum.de MFC after: 1 month	2018-10-30 00:22:14 +00:00
Eugene Grosbein	6d305ab0b2	Extend stripeoffset and stripesize of GEOMs from u_int to off_t GEOM's stripeoffset overflows at 4 gigabyte margin (2^32) because of its u_int type. This leads to incorrect data in the output generated by "sysctl kern.geom.confxml" command, "graid list" etc. when GEOM array has volumes larger than 4G, for example. This change does not affect ABI but changes KBI. No MFC planned. Differential Revision: https://reviews.freebsd.org/D13426	2018-10-27 16:14:42 +00:00
Xin LI	0db665bb98	Restore backward compatibility for "attach" verb. In r332361 and r333439, two new parameters were added to geli attach verb using gctl_get_paraml, which requires the value to be present. This would prevent old geli(8) binary from attaching geli(4) device as they have no knowledge about the new parameters. Restore backward compatibility by treating the absense of these two values as seeing the default value supplied by userland. PR: 232595 Reviewed by: oshogbo MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D17680	2018-10-27 03:37:14 +00:00
Glen Barber	01d4e2149e	MFH r338661 through r339200. Sponsored by: The FreeBSD Foundation	2018-10-05 17:53:47 +00:00
Alexander Motin	cf4a52cf67	Fix use-after-free in RAID0 error reporting of GEOM_RAID. PR: 231510 Submitted by: yangx92@hotmail.com Approved by: re (gjb) MFC after: 1 week	2018-09-24 16:58:55 +00:00
Jung-uk Kim	9c40dcbe5f	Make geli(8) buildable.	2018-09-19 07:08:04 +00:00
Conrad Meyer	1b0909d51a	OpenCrypto: Convert sessions to opaque handles instead of integers Track session objects in the framework, and pass handles between the framework (OCF), consumers, and drivers. Avoid redundancy and complexity in individual drivers by allocating session memory in the framework and providing it to drivers in ::newsession(). Session handles are no longer integers with information encoded in various high bits. Use of the CRYPTO_SESID2FOO() macros should be replaced with the appropriate crypto_ses2foo() function on the opaque session handle. Convert OCF drivers (in particular, cryptosoft, as well as myriad others) to the opaque handle interface. Discard existing session tracking as much as possible (quick pass). There may be additional code ripe for deletion. Convert OCF consumers (ipsec, geom_eli, krb5, cryptodev) to handle-style interface. The conversion is largely mechnical. The change is documented in crypto.9. Inspired by https://lists.freebsd.org/pipermail/freebsd-arch/2018-January/018835.html . No objection from: ae (ipsec portion) Reported by: jhb	2018-07-18 00:56:25 +00:00
Conrad Meyer	1df7f41560	OCF: Convert consumers to the session id typedef These were missed in the earlier r336269. No functional change. Sponsored by: Dell EMC Isilon	2018-07-16 19:01:05 +00:00
Mariusz Zaborski	78f79a9a08	Let geli deal with lost devices without crashing. PR: 162036 Submitted by: Fabian Keil <fk@fabiankeil.de> Obtained from: ElectroBSD Discussed with: pjd@	2018-07-15 18:03:19 +00:00
Warner Losh	4bae19e9b8	g_eli_key_cmp is used only in the kernel, so only define it in the kernel.	2018-07-13 18:21:38 +00:00
Mikolaj Golub	874774c5d4	geom_gate: enable resize Reviewed By: pjd Approved By: pjd Differential Revision: https://reviews.freebsd.org/D11531	2018-07-13 07:08:06 +00:00
Ed Maste	76db6c8773	gpart: add EFI alias for MBR partition scheme Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D15870	2018-06-17 20:10:48 +00:00
Ed Maste	a0a8412b2a	Sort geom/part mbr/ebr/ldm alias table entries Having the table entries in alpha order simplifies future additions. Sponsored by: The FreeBSD Foundation	2018-06-17 20:06:27 +00:00
Mariusz Zaborski	31f7586d73	Introduce the 'n' flag for the geli attach command. If the 'n' flag is provided the provided key number will be used to decrypt device. This can be used combined with dryrun to verify if the key is set correctly. This can be also used to determine which key slot we want to change on already attached device. Reviewed by: allanjude Differential Revision: https://reviews.freebsd.org/D15309	2018-05-09 20:53:38 +00:00
Mark Johnston	bd92e6b6f5	Refactor some of the MI kernel dump code in preparation for netdump. - Add clear_dumper() to complement set_dumper(). - Drain netdump's preallocated mbuf pool when clearing the dumper. - Don't do bounds checking for dumpers with mediasize 0. - Add dumper callbacks for initialization for writing out headers. Reviewed by: sbruno MFC after: 1 month Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15252	2018-05-06 00:22:38 +00:00
Mark Johnston	681554d70b	Remove a redundant assertion. MFC after: 1 week Sponsored by: Dell EMC Isilon	2018-05-06 00:05:03 +00:00
Mark Johnston	40e805221b	Avoid dropping the topology lock in gmirror's dumpconf implementation. Doing so introduces races which can lead to a use-after-free when grabbing a snapshot of the GEOM mesh. To ensure that a mirror's disk list remains stable, change its locking protocol: both the softc lock and the topology lock are now required to modify the list, so either lock is sufficient for traversal. Tested by: pho MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2018-05-06 00:03:24 +00:00
Ed Maste	b525a10ac0	gpart: add fat32lba MBR partition type FAT32 partition with LBA addressing. Reviewed by: marcel MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D15266	2018-05-04 00:34:27 +00:00
Kyle Evans	74d6c131cb	Annotate geom modules with MODULE_VERSION GEOM ELI may double ask the password during boot. Once at loader time, and once at init time. This happens due a module loading bug. By default GEOM ELI caches the password in the kernel, but without the MODULE_VERSION annotation, the kernel loads over the kernel module, even if the GEOM ELI was compiled into the kernel. In this case, the newly loaded module purges/invalidates/overwrites the GEOM ELI's password cache, which causes the double asking. MFC Note: There's a pc98 component to the original submission that is omitted here due to pc98 removal in head. This part will need to be revived upon MFC. Reviewed by: imp Submitted by: op Obtained from: opBSD MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D14992	2018-04-10 19:18:16 +00:00
Mariusz Zaborski	8f1c45c20a	Introduce dry run option for attaching the device. This will allow us to verify if passphrase and key is valid without decrypting whole device. Reviewed by: cem@, allanjude@ Differential Revision: https://reviews.freebsd.org/D15000	2018-04-10 13:22:48 +00:00
Kyle Evans	2967ace894	Retire the geom_aes class It's had a good life, but it's not really configurable and not really used. Obtained from: opBSD (with some changes) Differential Revision: https://reviews.freebsd.org/D14991	2018-04-09 17:30:30 +00:00
Brooks Davis	6469bdcdb6	Move most of the contents of opt_compat.h to opt_global.h. opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941	2018-04-06 17:35:35 +00:00
Sean Bruno	2c385d51ce	Squash error from geom by sizing ident strings to DISK_IDENT_SIZE. Display attribute in future error strings and differentiate g_handleattr() error messages for ease of debugging in the future. "g_handleattr: md1 bio_length 24 strlen 31 -> EFAULT" Reported by: swills Reviewed by: imp cem avg Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14962	2018-04-05 13:56:40 +00:00
Kirk McKusick	fb15890a8c	When freeing a superblock returned by ffs_sbget, be sure to also free the superblock summary information. Reported by: Peter Holm (pho@) Tested by: Peter Holm (pho@)	2018-03-24 15:36:25 +00:00
Mariusz Zaborski	9ea857cf0f	Remove unneeded variable which was introduced in r328472. Pointed out by: pjd@	2018-03-18 15:09:55 +00:00
Andriy Gapon	aca41af247	g_access: deal with races created by geoms that drop the topology lock The problem is that g_access() must be called with the GEOM topology lock held. And that gives a false impression that the lock is indeed held across the call. But this isn't always true because many classes, ZVOL being one of the many, need to drop the lock. It's either to perform an I/O on the first open or to acquire a different lock (like in g_mirror_access). That, of course, can break many assumptions. For example, g_slice_access() adds an extra exclusive count on the first open. As described above, an underlying geom may drop the topology lock and that would open a race with another thread that would also request another extra exclusive count. In general, two consumers may be granted incompatible accesses. To avoid this problem the code is changed to mark a geom with special flag before calling its access method and clear the flag afterwards. If another thread sees that flag, then it means that the topology lock has been dropped (either by the geom in question or downstream from it), so it is not safe to make another access call. So, the second thread would use g_topology_sleep() to wait until the flag is cleared and only then would it proceed with the access. Also see http://docs.freebsd.org/cgi/mid.cgi?809d9254-ee56-59d8-69a4-08838e985cea PR: 225960 Reported by: asomers Reviewed by: markj, mav MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14533	2018-03-15 09:16:10 +00:00
Conrad Meyer	ee4d316fe7	g_part_gpt: Fix memory leak in error path If g_part_gpt_read() encountered a disk with bad primary and secondary tables, it could leak memory. Reported by: Coverity Sponsored by: Dell EMC Isilon	2018-03-07 01:55:50 +00:00
Conrad Meyer	90575a0ec9	g_label_ufs: Fix typo from r330264 Reported by: O. Hartmann <o.hartmann AT walstatt.org> Sponsored by: Dell EMC Isilon	2018-03-02 06:02:54 +00:00
Kirk McKusick	efbf396426	This change is some refactoring of Mark Johnston's changes in r329375 to fix the memory leak that I introduced in r328426. Instead of trying to clear up the possible memory leak in all the clients, I ensure that it gets cleaned up in the source (e.g., ffs_sbget ensures that memory is always freed if it returns an error). The original change in r328426 was a bit sparse in its description. So I am expanding on its description here (thanks cem@ and rgrimes@ for your encouragement for my longer commit messages). In preparation for adding check hashing to superblocks, r328426 is a refactoring of the code to get the reading/writing of the superblock into one place. Unlike the cylinder group reading/writing which ends up in two places (ffs_getcg/ffs_geom_strategy in the kernel and cgget/cgput in libufs), I have the core superblock functions just in the kernel (ffs_sbfetch/ffs_sbput in ffs_subr.c which is already imported into utilities like fsck_ffs as well as libufs to implement sbget/sbput). The ffs_sbfetch and ffs_sbput functions take a function pointer to do the actual I/O for which there are four variants: ffs_use_bread / ffs_use_bwrite for the in-kernel filesystem g_use_g_read_data / g_use_g_write_data for kernel geom clients ufs_use_sa_read for the standalone code (stand/libsa/ufs.c but not stand/libsa/ufsread.c which is size constrained) use_pread / use_pwrite for libufs Uses of these interfaces are in the UFS filesystem, geoms journal & label, libsa changes, and libufs. They also permeate out into the filesystem utilities fsck_ffs, newfs, growfs, clri, dump, quotacheck, fsirand, fstyp, and quot. Some of these utilities should probably be converted to directly use libufs (like dumpfs was for example), but there does not seem to be much win in doing so. Tested by: Peter Holm (pho@)	2018-03-02 04:34:53 +00:00
Mark Johnston	16759360d4	Fix a memory leak introduced in r328426. ffs_sbget() may return a superblock buffer even if it fails, so the caller must be prepared to free it in this case. Moreover, when tasting alternate superblock locations in a loop, ffs_sbget()'s readfunc callback must free the previously allocated buffer. Reported and tested by: pho Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D14390	2018-02-16 15:41:03 +00:00
Alan Somers	834063202a	gpart: append partition name to the underlying provider's physical path If the underlying provider's physical path is null, then the gpart device's physical path will be, too. Otherwise, it will append the partition name, such as "/p1" or "/s1/a". This will make gpart work better with zfsd(8). PR: 224965 MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D14010	2018-02-14 20:26:09 +00:00
Alan Somers	0bab7fa8a7	geli: append "/eli" to the underlying provider's physical path If the underlying provider's physical path is null, then the geli device's physical path will be, too. Otherwise, it will append "/eli". This will make geli work better with zfsd(8). PR: 224962 MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D13979	2018-02-14 20:15:32 +00:00
Justin Hibbits	d793587fe2	Fix a panic introduced in r329225 Some GEOM partition tables may be destroyed with incomplete partition entries. Guard against this with NULL checks. Reported by: pholm,others Reviewed by: markj Tested by: pholm	2018-02-14 15:12:09 +00:00
Justin Hibbits	08a3b42fdb	Narrow a race, and fix a leak, in g_part_wither A race in g_part_wither() can lead to I/O being performed with a freed GEOM when the device disappears. Close the race as best as we can for now, following the code patterns from g_part_ctl_destroy() and g_part_ctl_undo(). This also fixes a leak, as g_wither_geom() does not wither providers, it only orphans them, so the partition entries would never get destroyed in g_wither_washer(). Note, this is not a complete fix, it can still race with g_part_start(), the race has merely been narrowed. Reviewed by: markj Sponsored by: Dell EMC Isilon	2018-02-13 17:40:09 +00:00
Conrad Meyer	b42712a8b7	Add GUID and alias for Apple APFS partition PR: 225813 Submitted by: James Wright <james.wright AT jigsawdezign.com>	2018-02-11 06:57:20 +00:00
Mark Johnston	0d02f6c201	Simplify synchronization read error handling. Since synchronization reads are performed by submitting a request to the external mirror provider, we know that the request returns with an error only when gmirror was unable to read a copy of the block from any mirror. Thus, there is no need to retry the request from the synchronization error handler. Tested by: pho MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2018-02-06 16:02:33 +00:00
Alan Somers	f5b4099e6b	geom: don't write stack garbage in disk labels Most consumers of g_metadata_store were passing in partially unallocated memory, resulting in stack garbage being written to disk labels. Fix them by zeroing the memory first. gvirstor repeated the same mistake, but in the kernel. Also, glabel's label contained a fixed-size string that wasn't initialized to zero. PR: 222077 Reported by: Maxim Khitrov <max@mxcrypt.com> Reviewed by: cem MFC after: 3 weeks X-MFC-With: 323314 X-MFC-With: 323338 Differential Revision: https://reviews.freebsd.org/D14164	2018-02-04 14:49:55 +00:00

1 2 3 4 5 ...

2209 Commits