freebsd-dev/sys/cam/scsi
Kenneth D. Merry 0c8f059c29 Fix a hang introduced in r351599.
My changes in 351599 (kindly committed by avg) made the cd(4) media check
asynchronous to avoid a sleep while holding a mutex.

There was a difficult to reproduce bug with those changes that caused a
hang on boot on some single processor machines/VMs.  Leandro Lupori
managed to reproduce the bug, diagnose it, and supplied a patch!  Here is
his analysis, from the PR:

======
I was able to reproduce the problem described in comment#14.

Actually, I wasn't trying to reproduce it, I just started seeing it a few
weeks ago, in CURRENT.

I can reproduce it consistently, by using QEMU to run a PowerPC64 VM with a
single core/thread (-smp 1).

It happens only when there is no media in the emulated CD-ROM, a device
that QEMU adds by default, unless -nodefaults is specified in command line.

I've debugged it and this is what I've found:

1- After the CD probe is successful, GEOM will try to open the device,
which will end up calling cdcheckmedia(), that sets CD state to
CD_STATE_MEDIA_PREVENT.
2- Next, scsi_prevent() is executed and succeeds, the CD_FLAG_DISC_LOCKED
flag is set and CD state moves to CD_STATE_MEDIA_SIZE.
3- Next, scsi_read_capacity() is executed and fails, state is set to
CD_STATE_MEDIA_ALLOW, cdmediaprobedone() is called and wakes up
cdcheckmedia().
4- Then, when cdstart() is invoked to process CD_STATE_MEDIA_ALLOW, it
first checks if CD_FLAG_DISC_LOCKED is set, and if so skips directly to
CD_STATE_MEDIA_SIZE state. This will repeat the steps of bullet 3, entering
an infinite MEDIA_SIZE command loop.

When there is a least another core/thread, the GEOM thread that performed
the initial cdopen() will get scheduled again, closing the CD device, that
will call cdprevent(PR_ALLOW) that clears the CD_FLAG_DISC_LOCKED flag and
breaks the loop.

So, apparently, the problem is CD_STATE_MEDIA_ALLOW being skipped when
CD_FLAG_DISC_LOCKED is set. If I understand correctly, in this case, the
state should be advanced to CD_STATE_MEDIA size only when the current state
is CD_STATE_MEDIA_PREVENT.
=====

PR:		kern/219857
Submitted by:	Leandro Lupori <leandro.lupori@gmail.com>
MFC after:	1 week
2019-12-02 19:57:39 +00:00
..
scsi_all.c Set handling for some "Logical unit not ready" errors. 2019-11-20 20:00:03 +00:00
scsi_all.h Make camcontrol modepage support block descriptors. 2019-08-07 14:45:10 +00:00
scsi_cd.c Fix a hang introduced in r351599. 2019-12-02 19:57:39 +00:00
scsi_cd.h scsi_cd: make the media check asynchronous 2019-08-29 07:51:11 +00:00
scsi_ch.c cam_periph_acquire() now returns an errno. 2018-03-19 20:19:00 +00:00
scsi_ch.h sys/cam: further adoption of SPDX licensing ID tags. 2017-11-27 15:12:43 +00:00
scsi_da.c Fix a race between daopen and damediapoll 2019-11-13 01:58:43 +00:00
scsi_da.h Improve support for informational exceptions. 2016-12-19 10:25:47 +00:00
scsi_enc_internal.h Make CAM use root_mount_hold_token() to delay boot. 2019-11-22 18:39:51 +00:00
scsi_enc_safte.c Improve AHCI Enclosure Management and SES interoperation. 2019-06-23 19:05:01 +00:00
scsi_enc_ses.c Fix assumptions of only one device per SES slot. 2019-09-11 03:25:30 +00:00
scsi_enc.c Make CAM use root_mount_hold_token() to delay boot. 2019-11-22 18:39:51 +00:00
scsi_enc.h Improve AHCI Enclosure Management and SES interoperation. 2019-06-23 19:05:01 +00:00
scsi_iu.h
scsi_message.h Add partial support for QUERY TMF to CAM and isp(4). 2015-10-23 18:34:18 +00:00
scsi_pass.c Drop periph lock around cam_periph_unmapmem(). 2019-05-06 19:08:03 +00:00
scsi_pass.h sys/cam: further adoption of SPDX licensing ID tags. 2017-11-27 15:12:43 +00:00
scsi_pt.c Return a C errno for cam_periph_acquire(). 2018-02-06 06:42:25 +00:00
scsi_pt.h sys/cam: further adoption of SPDX licensing ID tags. 2017-11-27 15:12:43 +00:00
scsi_sa.c cam_periph_runccb() changed several years ago to overwrite the ccb callback 2018-05-01 20:09:29 +00:00
scsi_sa.h sys/cam: further adoption of SPDX licensing ID tags. 2017-11-27 15:12:43 +00:00
scsi_ses.h Improve AHCI Enclosure Management and SES interoperation. 2019-06-23 19:05:01 +00:00
scsi_sg.c Remove NEEDGIANT from the scsi_sg /dev node. It likely has not been 2019-11-22 18:18:36 +00:00
scsi_sg.h
scsi_targ_bh.c sys/cam: further adoption of SPDX licensing ID tags. 2017-11-27 15:12:43 +00:00
scsi_target.c Define xpt_path_inq. 2017-12-06 23:05:22 +00:00
scsi_targetio.h sys/cam: further adoption of SPDX licensing ID tags. 2017-11-27 15:12:43 +00:00
scsi_xpt.c Take proper lock in ses_setphyspath_callback(). 2019-08-29 17:02:02 +00:00
smp_all.c sys/cam: further adoption of SPDX licensing ID tags. 2017-11-27 15:12:43 +00:00
smp_all.h sys/cam: further adoption of SPDX licensing ID tags. 2017-11-27 15:12:43 +00:00