Discussion:
I got a panic for "nvme0: cpl does not map to outstanding cmd" on a MACHIATObin Double Shot
Mark Millard via freebsd-arm
2021-05-23 04:44:22 UTC
Permalink
# mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
# diff -r /usr/ports/ /mnt/ | more
nvme0: cpl does not map to outstanding cmd
cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
panic: received completion for unknown cmd
cpuid = 3
time = 1621743752
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x188
panic() at panic+0x44
nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
nvme_timeout() at nvme_timeout+0x3c
softclock_call_cc() at softclock_call_cc+0x124
softclock() at softclock+0x60
ithread_loop() at ithread_loop+0x2a8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 12 tid 100028 ]
Stopped at kdb_enter+0x48: undefined f904411f
db>

Based on the "nvme" references, I expect this is tied to
handling the Optane 480 GiByte that is in the PCIe slot
and is the boot/only media for the machine doing the diff.

"db> dump" seems to have worked.

After the reboot, zpool scrub found no errors.

For reference:

# uname -apKU
FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 ***@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Warner Losh
2021-05-23 05:16:02 UTC
Permalink
On Sat, May 22, 2021 at 10:44 PM Mark Millard via freebsd-arm <
Post by Mark Millard via freebsd-arm
# mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
# diff -r /usr/ports/ /mnt/ | more
nvme0: cpl does not map to outstanding cmd
cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
panic: received completion for unknown cmd
cid 0x7e has no currently active command. The cid is used by the driver
to map completions back to requests.

So, there's usually 3 possibilities that I've seen this with.

(1) There's a missing cache flush so you get a bogus cpl back because
something stale
was read. It's unlikely to be this one because the rest of this look like a
successful
command completed: sc = 0 is successful completion and sct is a generic
command queued.

(2) We're looking at the completion record twice because we failed to
properly update the
head pointer and we've already completed the command. I've only ever seen
this in a
panic situation where we interrupt the completion routine because something
else
paniced.

(3) There's something that's corrupting the act_tr array in the qpair. I've
not seen this,
but if something else smashes that area (zeroing it in this case), then
that could cause
an error like this.

Or it could be something new I've not seen nor thought about before.
Post by Mark Millard via freebsd-arm
cpuid = 3
time = 1621743752
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x188
panic() at panic+0x44
nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
nvme_timeout() at nvme_timeout+0x3c
softclock_call_cc() at softclock_call_cc+0x124
softclock() at softclock+0x60
ithread_loop() at ithread_loop+0x2a8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 12 tid 100028 ]
Stopped at kdb_enter+0x48: undefined f904411f
db>
Based on the "nvme" references, I expect this is tied to
handling the Optane 480 GiByte that is in the PCIe slot
and is the boot/only media for the machine doing the diff.
"db> dump" seems to have worked.
After the reboot, zpool scrub found no errors.
# uname -apKU
FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1
main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021
arm64 aarch64 1400013 1400013
If you have the dump, I suggest starting to make sure that the act_tr array
looks sane. Make
sure all the live pointers point to a sane looking tr. Make sure that tr is
on the active list, etc

It will take a fair amount of driver reading, though, to see how we got
here. I'd also check to
make sure that qpair->num_enttries > cpl.cid (0x3e in this case).

Warner
Mark Millard via freebsd-current
2021-05-23 07:08:20 UTC
Permalink
Post by Mark Millard via freebsd-arm
# mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
# diff -r /usr/ports/ /mnt/ | more
nvme0: cpl does not map to outstanding cmd
cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
panic: received completion for unknown cmd
cid 0x7e has no currently active command. The cid is used by the driver
to map completions back to requests.
So, there's usually 3 possibilities that I've seen this with.
(1) There's a missing cache flush so you get a bogus cpl back because something stale
was read. It's unlikely to be this one because the rest of this look like a successful
command completed: sc = 0 is successful completion and sct is a generic command queued.
(2) We're looking at the completion record twice because we failed to properly update the
head pointer and we've already completed the command. I've only ever seen this in a
panic situation where we interrupt the completion routine because something else
paniced.
(3) There's something that's corrupting the act_tr array in the qpair. I've not seen this,
but if something else smashes that area (zeroing it in this case), then that could cause
an error like this.
Of note may be that I buildworld and buildkernel with extra
tuning enabled, targeting the cortex-a72. In one past example
this lead to finding a missing synchronization related to XHCI
handling that was fixed. (The fix was not aarch64 specific at
all.) For that: A cortex-a53 did not show the problem with or
without that tuning. A cortex-a72 showed the problem only with
the cortex-a72 tuning, not with targeting a cortex-a53 tuning
or generic armv7, for example.

Not that I've any evidence specifically suggesting such would
be involved here. But it might be good to keep in mind as a
possaibility.
Post by Mark Millard via freebsd-arm
Or it could be something new I've not seen nor thought about before.
cpuid = 3
time = 1621743752
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x188
panic() at panic+0x44
nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
nvme_timeout() at nvme_timeout+0x3c
softclock_call_cc() at softclock_call_cc+0x124
softclock() at softclock+0x60
ithread_loop() at ithread_loop+0x2a8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 12 tid 100028 ]
Stopped at kdb_enter+0x48: undefined f904411f
db>
Based on the "nvme" references, I expect this is tied to
handling the Optane 480 GiByte that is in the PCIe slot
and is the boot/only media for the machine doing the diff.
"db> dump" seems to have worked.
After the reboot, zpool scrub found no errors.
# uname -apKU
If you have the dump, I suggest starting to make sure that the act_tr array looks sane. Make
sure all the live pointers point to a sane looking tr. Make sure that tr is on the active list, etc
It will take a fair amount of driver reading, though, to see how we got here. I'd also check to
make sure that qpair->num_enttries > cpl.cid (0x3e in this case).
Okay. I got this while trying to test an odd diff -r over NFS
issue with the more recent software. So the two will potentially
compete for time.

As investigation will be exploratory for me, not familiar, I'll
probably publish periodic notes on things as I go along looking
at stuff.

My first is that the /var/crash/core.txt.0 has a gdb backtrace:

. . .
#10 0xffff00000047900c in panic (
fmt=0x12 <error: Cannot access memory at address 0x12>)
at /usr/main-src/sys/kern/kern_shutdown.c:843
#11 0xffff0000002226b4 in nvme_qpair_process_completions (
qpair=***@entry=0xffffa00008724300)
at /usr/main-src/sys/dev/nvme/nvme_qpair.c:617
#12 0xffff000000223354 in nvme_timeout (arg=***@entry=0xffffa0000b053980)
at /usr/main-src/sys/dev/nvme/nvme_qpair.c:938
#13 0xffff000000495bf8 in softclock_call_cc (c=0xffffa0000b0539a0,
cc=***@entry=0xffff000000de3500 <cc_cpu+768>, direct=0)
at /usr/main-src/sys/kern/kern_timeout.c:696
#14 0xffff000000495fb0 in softclock (arg=0xffff000000de3500 <cc_cpu+768>)
at /usr/main-src/sys/kern/kern_timeout.c:816
#15 0xffff0000004356dc in intr_event_execute_handlers (p=<optimized out>,
ie=0xffffa000058bc700) at /usr/main-src/sys/kern/kern_intr.c:1168
#16 ithread_execute_handlers (p=<optimized out>, ie=0xffffa000058bc700)
at /usr/main-src/sys/kern/kern_intr.c:1181
#17 ithread_loop (arg=<optimized out>, ***@entry=0xffffa000058aef60)
at /usr/main-src/sys/kern/kern_intr.c:1269
#18 0xffff000000431f6c in fork_exit (
callout=0xffff000000435430 <ithread_loop>, arg=0xffffa000058aef60,
frame=0xffff0000eb7cc990) at /usr/main-src/sys/kern/kern_fork.c:1083
#19 <signal handler called>

So via kgdb . . .

(kgdb) up 11
#11 0xffff0000002226b4 in nvme_qpair_process_completions (qpair=***@entry=0xffffa00008724300) at /usr/main-src/sys/dev/nvme/nvme_qpair.c:617
617 KASSERT(0, ("received completion for unknown cmd"));

(kgdb) print/x cpl.cid
$4 = 0x7e
(kgdb) print/x qpair->num_entries
$5 = 0x100

Based on also seeing the code:

qpair->act_tr = malloc_domainset(sizeof(struct nvme_tracker *) *
qpair->num_entries, M_NVME, DOMAINSET_PREF(qpair->domain),
M_ZERO | M_WAITOK);

(kgdb) print qpair->act_tr
$6 = (struct nvme_tracker **) 0xffffa00008725800
(kgdb) x/256g 0xffffa00008725800
0xffffa00008725800: 0x0000000000000000 0x0000000000000000
0xffffa00008725810: 0x0000000000000000 0x0000000000000000
. . .
0xffffa00008725fe0: 0x0000000000000000 0x0000000000000000
0xffffa00008725ff0: 0x0000000000000000 0x0000000000000000

It was all zeros (null pointers). No "live" pointers and, so,
no tr's to inspect.

As none of this is familiar context beyond general programming
concepts, it may be some time before I find anything else
potentially of interest to report. If you have other specific
things you would like me to look at, let me know.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Mark Millard via freebsd-current
2021-05-23 07:46:28 UTC
Permalink
Post by Mark Millard via freebsd-current
Post by Mark Millard via freebsd-arm
# mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
# diff -r /usr/ports/ /mnt/ | more
nvme0: cpl does not map to outstanding cmd
cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
panic: received completion for unknown cmd
cid 0x7e has no currently active command. The cid is used by the driver
to map completions back to requests.
So, there's usually 3 possibilities that I've seen this with.
(1) There's a missing cache flush so you get a bogus cpl back because something stale
was read. It's unlikely to be this one because the rest of this look like a successful
command completed: sc = 0 is successful completion and sct is a generic command queued.
(2) We're looking at the completion record twice because we failed to properly update the
head pointer and we've already completed the command. I've only ever seen this in a
panic situation where we interrupt the completion routine because something else
paniced.
(3) There's something that's corrupting the act_tr array in the qpair. I've not seen this,
but if something else smashes that area (zeroing it in this case), then that could cause
an error like this.
Of note may be that I buildworld and buildkernel with extra
tuning enabled, targeting the cortex-a72. In one past example
this lead to finding a missing synchronization related to XHCI
handling that was fixed. (The fix was not aarch64 specific at
all.) For that: A cortex-a53 did not show the problem with or
without that tuning. A cortex-a72 showed the problem only with
the cortex-a72 tuning, not with targeting a cortex-a53 tuning
or generic armv7, for example.
Not that I've any evidence specifically suggesting such would
be involved here. But it might be good to keep in mind as a
possaibility.
Post by Mark Millard via freebsd-arm
Or it could be something new I've not seen nor thought about before.
cpuid = 3
time = 1621743752
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x188
panic() at panic+0x44
nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
nvme_timeout() at nvme_timeout+0x3c
softclock_call_cc() at softclock_call_cc+0x124
softclock() at softclock+0x60
ithread_loop() at ithread_loop+0x2a8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 12 tid 100028 ]
Stopped at kdb_enter+0x48: undefined f904411f
db>
Based on the "nvme" references, I expect this is tied to
handling the Optane 480 GiByte that is in the PCIe slot
and is the boot/only media for the machine doing the diff.
"db> dump" seems to have worked.
After the reboot, zpool scrub found no errors.
# uname -apKU
If you have the dump, I suggest starting to make sure that the act_tr array looks sane. Make
sure all the live pointers point to a sane looking tr. Make sure that tr is on the active list, etc
It will take a fair amount of driver reading, though, to see how we got here. I'd also check to
make sure that qpair->num_enttries > cpl.cid (0x3e in this case).
Okay. I got this while trying to test an odd diff -r over NFS
issue with the more recent software. So the two will potentially
compete for time.
As investigation will be exploratory for me, not familiar, I'll
probably publish periodic notes on things as I go along looking
at stuff.
. . .
#10 0xffff00000047900c in panic (
fmt=0x12 <error: Cannot access memory at address 0x12>)
at /usr/main-src/sys/kern/kern_shutdown.c:843
#11 0xffff0000002226b4 in nvme_qpair_process_completions (
at /usr/main-src/sys/dev/nvme/nvme_qpair.c:617
at /usr/main-src/sys/dev/nvme/nvme_qpair.c:938
#13 0xffff000000495bf8 in softclock_call_cc (c=0xffffa0000b0539a0,
at /usr/main-src/sys/kern/kern_timeout.c:696
#14 0xffff000000495fb0 in softclock (arg=0xffff000000de3500 <cc_cpu+768>)
at /usr/main-src/sys/kern/kern_timeout.c:816
#15 0xffff0000004356dc in intr_event_execute_handlers (p=<optimized out>,
ie=0xffffa000058bc700) at /usr/main-src/sys/kern/kern_intr.c:1168
#16 ithread_execute_handlers (p=<optimized out>, ie=0xffffa000058bc700)
at /usr/main-src/sys/kern/kern_intr.c:1181
at /usr/main-src/sys/kern/kern_intr.c:1269
#18 0xffff000000431f6c in fork_exit (
callout=0xffff000000435430 <ithread_loop>, arg=0xffffa000058aef60,
frame=0xffff0000eb7cc990) at /usr/main-src/sys/kern/kern_fork.c:1083
#19 <signal handler called>
So via kgdb . . .
(kgdb) up 11
617 KASSERT(0, ("received completion for unknown cmd"));
(kgdb) print/x cpl.cid
$4 = 0x7e
(kgdb) print/x qpair->num_entries
$5 = 0x100
qpair->act_tr = malloc_domainset(sizeof(struct nvme_tracker *) *
qpair->num_entries, M_NVME, DOMAINSET_PREF(qpair->domain),
M_ZERO | M_WAITOK);
(kgdb) print qpair->act_tr
$6 = (struct nvme_tracker **) 0xffffa00008725800
(kgdb) x/256g 0xffffa00008725800
0xffffa00008725800: 0x0000000000000000 0x0000000000000000
0xffffa00008725810: 0x0000000000000000 0x0000000000000000
. . .
0xffffa00008725fe0: 0x0000000000000000 0x0000000000000000
0xffffa00008725ff0: 0x0000000000000000 0x0000000000000000
It was all zeros (null pointers). No "live" pointers and, so,
no tr's to inspect.
As none of this is familiar context beyond general programming
concepts, it may be some time before I find anything else
potentially of interest to report. If you have other specific
things you would like me to look at, let me know.
A fairly obvious thing I should have provided:

(kgdb) print/x *qpair
$15 = {ctrlr = 0xffff0000fe154000, id = 0x3, domain = 0x0, cpu = 0x2, vector = 0x3, rid = 0x4, res = 0xffffa000086ded80, tag = 0xffffa0000877b780, num_entries = 0x100, num_trackers = 0x80,
sq_tdbl_off = 0x1018, cq_hdbl_off = 0x101c, phase = 0x1, sq_head = 0x1f, sq_tail = 0x20, cq_head = 0x20, num_cmds = 0x420, num_intr_handler_calls = 0xe66c, num_retries = 0x0, num_failures = 0x0,
cmd = 0xffff000100ebb000, cpl = 0xffff000100ebf000, dma_tag = 0xffffa0000b093e00, dma_tag_payload = 0xffffa000059ef000, queuemem_map = 0xffffa00005a07700, cmd_bus_addr = 0xacbb000,
cpl_bus_addr = 0xacbf000, free_tr = {tqh_first = 0xffffa0000b053a80, tqh_last = 0xffffa0000869da80}, outstanding_tr = {tqh_first = 0xffffa0000b053980, tqh_last = 0xffffa0000b053980}, queued_req = {
stqh_first = 0x0, stqh_last = 0xffffa000087243c8}, act_tr = 0xffffa00008725800, is_enabled = 0x1, lock = {lock_object = {lo_name = 0xffff00000090321f, lo_flags = 0x1030000, lo_data = 0x0,
lo_witness = 0xffffa0043fd96080}, mtx_lock = 0x0}}

Looks like I need to boot into the non-debug builds for the
other problem I'm testing for repeatability after a commit.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Mark Millard via freebsd-current
2021-05-29 08:15:34 UTC
Permalink
Post by Mark Millard via freebsd-current
Post by Mark Millard via freebsd-current
Post by Mark Millard via freebsd-arm
# mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
# diff -r /usr/ports/ /mnt/ | more
nvme0: cpl does not map to outstanding cmd
cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
panic: received completion for unknown cmd
cid 0x7e has no currently active command. The cid is used by the driver
to map completions back to requests.
So, there's usually 3 possibilities that I've seen this with.
(1) There's a missing cache flush so you get a bogus cpl back because something stale
was read. It's unlikely to be this one because the rest of this look like a successful
command completed: sc = 0 is successful completion and sct is a generic command queued.
(2) We're looking at the completion record twice because we failed to properly update the
head pointer and we've already completed the command. I've only ever seen this in a
panic situation where we interrupt the completion routine because something else
paniced.
(3) There's something that's corrupting the act_tr array in the qpair. I've not seen this,
but if something else smashes that area (zeroing it in this case), then that could cause
an error like this.
Of note may be that I buildworld and buildkernel with extra
tuning enabled, targeting the cortex-a72. In one past example
this lead to finding a missing synchronization related to XHCI
handling that was fixed. (The fix was not aarch64 specific at
all.) For that: A cortex-a53 did not show the problem with or
without that tuning. A cortex-a72 showed the problem only with
the cortex-a72 tuning, not with targeting a cortex-a53 tuning
or generic armv7, for example.
Not that I've any evidence specifically suggesting such would
be involved here. But it might be good to keep in mind as a
possaibility.
Post by Mark Millard via freebsd-arm
Or it could be something new I've not seen nor thought about before.
cpuid = 3
time = 1621743752
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x188
panic() at panic+0x44
nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
nvme_timeout() at nvme_timeout+0x3c
softclock_call_cc() at softclock_call_cc+0x124
softclock() at softclock+0x60
ithread_loop() at ithread_loop+0x2a8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 12 tid 100028 ]
Stopped at kdb_enter+0x48: undefined f904411f
db>
Based on the "nvme" references, I expect this is tied to
handling the Optane 480 GiByte that is in the PCIe slot
and is the boot/only media for the machine doing the diff.
"db> dump" seems to have worked.
After the reboot, zpool scrub found no errors.
# uname -apKU
If you have the dump, I suggest starting to make sure that the act_tr array looks sane. Make
sure all the live pointers point to a sane looking tr. Make sure that tr is on the active list, etc
It will take a fair amount of driver reading, though, to see how we got here. I'd also check to
make sure that qpair->num_enttries > cpl.cid (0x3e in this case).
Okay. I got this while trying to test an odd diff -r over NFS
issue with the more recent software. So the two will potentially
compete for time.
As investigation will be exploratory for me, not familiar, I'll
probably publish periodic notes on things as I go along looking
at stuff.
. . .
#10 0xffff00000047900c in panic (
fmt=0x12 <error: Cannot access memory at address 0x12>)
at /usr/main-src/sys/kern/kern_shutdown.c:843
#11 0xffff0000002226b4 in nvme_qpair_process_completions (
at /usr/main-src/sys/dev/nvme/nvme_qpair.c:617
at /usr/main-src/sys/dev/nvme/nvme_qpair.c:938
#13 0xffff000000495bf8 in softclock_call_cc (c=0xffffa0000b0539a0,
at /usr/main-src/sys/kern/kern_timeout.c:696
#14 0xffff000000495fb0 in softclock (arg=0xffff000000de3500 <cc_cpu+768>)
at /usr/main-src/sys/kern/kern_timeout.c:816
#15 0xffff0000004356dc in intr_event_execute_handlers (p=<optimized out>,
ie=0xffffa000058bc700) at /usr/main-src/sys/kern/kern_intr.c:1168
#16 ithread_execute_handlers (p=<optimized out>, ie=0xffffa000058bc700)
at /usr/main-src/sys/kern/kern_intr.c:1181
at /usr/main-src/sys/kern/kern_intr.c:1269
#18 0xffff000000431f6c in fork_exit (
callout=0xffff000000435430 <ithread_loop>, arg=0xffffa000058aef60,
frame=0xffff0000eb7cc990) at /usr/main-src/sys/kern/kern_fork.c:1083
#19 <signal handler called>
So via kgdb . . .
(kgdb) up 11
617 KASSERT(0, ("received completion for unknown cmd"));
(kgdb) print/x cpl.cid
$4 = 0x7e
(kgdb) print/x qpair->num_entries
$5 = 0x100
qpair->act_tr = malloc_domainset(sizeof(struct nvme_tracker *) *
qpair->num_entries, M_NVME, DOMAINSET_PREF(qpair->domain),
M_ZERO | M_WAITOK);
(kgdb) print qpair->act_tr
$6 = (struct nvme_tracker **) 0xffffa00008725800
(kgdb) x/256g 0xffffa00008725800
0xffffa00008725800: 0x0000000000000000 0x0000000000000000
0xffffa00008725810: 0x0000000000000000 0x0000000000000000
. . .
0xffffa00008725fe0: 0x0000000000000000 0x0000000000000000
0xffffa00008725ff0: 0x0000000000000000 0x0000000000000000
It was all zeros (null pointers). No "live" pointers and, so,
no tr's to inspect.
As none of this is familiar context beyond general programming
concepts, it may be some time before I find anything else
potentially of interest to report. If you have other specific
things you would like me to look at, let me know.
(kgdb) print/x *qpair
$15 = {ctrlr = 0xffff0000fe154000, id = 0x3, domain = 0x0, cpu = 0x2, vector = 0x3, rid = 0x4, res = 0xffffa000086ded80, tag = 0xffffa0000877b780, num_entries = 0x100, num_trackers = 0x80,
sq_tdbl_off = 0x1018, cq_hdbl_off = 0x101c, phase = 0x1, sq_head = 0x1f, sq_tail = 0x20, cq_head = 0x20, num_cmds = 0x420, num_intr_handler_calls = 0xe66c, num_retries = 0x0, num_failures = 0x0,
cmd = 0xffff000100ebb000, cpl = 0xffff000100ebf000, dma_tag = 0xffffa0000b093e00, dma_tag_payload = 0xffffa000059ef000, queuemem_map = 0xffffa00005a07700, cmd_bus_addr = 0xacbb000,
cpl_bus_addr = 0xacbf000, free_tr = {tqh_first = 0xffffa0000b053a80, tqh_last = 0xffffa0000869da80}, outstanding_tr = {tqh_first = 0xffffa0000b053980, tqh_last = 0xffffa0000b053980}, queued_req = {
stqh_first = 0x0, stqh_last = 0xffffa000087243c8}, act_tr = 0xffffa00008725800, is_enabled = 0x1, lock = {lock_object = {lo_name = 0xffff00000090321f, lo_flags = 0x1030000, lo_data = 0x0,
lo_witness = 0xffffa0043fd96080}, mtx_lock = 0x0}}
Looks like I need to boot into the non-debug builds for the
other problem I'm testing for repeatability after a commit.
I've no figured out anything interesting so far.

But I've run into something that looks odd to me
(not that I've any evidence it is related to the
panic, more likely my ignorance):

There is a use of atomic_store_rel_int(&qpair->cq_head, 0)
for which I do not find any matching atomic_load_acq_int
use (or other explicit _acq), so so there is no "synchronizes
with" status in the code to establish an ordering across
threads that involves the atomic_store_rel_int that I've
found, just implicit/default relaxed-load-operations. A grep
for "cq_head" under dev/nvme/ shows only:

./dev/nvme/nvme_private.h: uint32_t cq_head;
./dev/nvme/nvme_sysctl.c: SYSCTL_ADD_UINT(ctrlr_ctx, que_list, OID_AUTO, "cq_head",
./dev/nvme/nvme_sysctl.c: CTLFLAG_RD, &qpair->cq_head, 0,
./dev/nvme/nvme_qpair.c: * below, but before we can reset cq_head to zero at 2. Also cope with
./dev/nvme/nvme_qpair.c: if (qpair->cq_head == qpair->num_entries) {
./dev/nvme/nvme_qpair.c: * Here we know that we need to zero cq_head and then negate
./dev/nvme/nvme_qpair.c: * the phase, which hasn't been assigned if cq_head isn't
./dev/nvme/nvme_qpair.c: qpair->cq_head = 0;
./dev/nvme/nvme_qpair.c: } else if (qpair->cq_head == 0) {
./dev/nvme/nvme_qpair.c: cpl = qpair->cpl[qpair->cq_head];
./dev/nvme/nvme_qpair.c: * qpair->cq_head at 1 below. Later, we re-enter this
./dev/nvme/nvme_qpair.c: * won't have updated cq_head. Rather than panic again,
./dev/nvme/nvme_qpair.c: nvme_dump_completion(&qpair->cpl[qpair->cq_head]);
./dev/nvme/nvme_qpair.c: if (++qpair->cq_head == qpair->num_entries) { /* 1 */
./dev/nvme/nvme_qpair.c: atomic_store_rel_int(&qpair->cq_head, 0); /* 2 */
./dev/nvme/nvme_qpair.c: qpair->cq_hdbl_off, qpair->cq_head);
./dev/nvme/nvme_qpair.c: qpair->sq_head = qpair->sq_tail = qpair->cq_head = 0;

(2 lines above the last has the atomic_store_rel_int use.)

atomic_thread_fence_rel use would have "synchronizes
with" based on ordinary loads reading something stored
after the atomic_thread_fence_rel. Such is documented
in "man atomic". But that is not what this code is doing.
"man atomic" does not mention ordinary loads getting such
a status by reading what an atomic_store_rel_int wrote.
It does reference the atomic_thread_fence_rel related
status for ordinary loads.

So I'm clueless about what is intended to be going on
relative to that "atomic_store_rel_int(&qpair->cq_head, 0)".
Overall, the code does not appear to me to match up with the
aarch64, powerpc64, or powerpc code generation requirements
to have any matching "synchronizes with" relationships.
(I'll not list machine instruction sequences here. I'd have
to look up the detailed sequences. But, as I remember, more
than a load is involved in the code sequences for the
acquire side on these types of processors. Nothing in the
source indicates to generate the additional code as far as
I can tell.)

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Mark Millard via freebsd-current
2021-05-29 20:25:04 UTC
Permalink
Post by Mark Millard via freebsd-current
Post by Mark Millard via freebsd-current
Post by Mark Millard via freebsd-current
Post by Mark Millard via freebsd-arm
# mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
# diff -r /usr/ports/ /mnt/ | more
nvme0: cpl does not map to outstanding cmd
cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
panic: received completion for unknown cmd
cid 0x7e has no currently active command. The cid is used by the driver
to map completions back to requests.
So, there's usually 3 possibilities that I've seen this with.
(1) There's a missing cache flush so you get a bogus cpl back because something stale
was read. It's unlikely to be this one because the rest of this look like a successful
command completed: sc = 0 is successful completion and sct is a generic command queued.
(2) We're looking at the completion record twice because we failed to properly update the
head pointer and we've already completed the command. I've only ever seen this in a
panic situation where we interrupt the completion routine because something else
paniced.
(3) There's something that's corrupting the act_tr array in the qpair. I've not seen this,
but if something else smashes that area (zeroing it in this case), then that could cause
an error like this.
Of note may be that I buildworld and buildkernel with extra
tuning enabled, targeting the cortex-a72. In one past example
this lead to finding a missing synchronization related to XHCI
handling that was fixed. (The fix was not aarch64 specific at
all.) For that: A cortex-a53 did not show the problem with or
without that tuning. A cortex-a72 showed the problem only with
the cortex-a72 tuning, not with targeting a cortex-a53 tuning
or generic armv7, for example.
Not that I've any evidence specifically suggesting such would
be involved here. But it might be good to keep in mind as a
possaibility.
Post by Mark Millard via freebsd-arm
Or it could be something new I've not seen nor thought about before.
cpuid = 3
time = 1621743752
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x188
panic() at panic+0x44
nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
nvme_timeout() at nvme_timeout+0x3c
softclock_call_cc() at softclock_call_cc+0x124
softclock() at softclock+0x60
ithread_loop() at ithread_loop+0x2a8
fork_exit() at fork_exit+0x74
fork_trampoline() at fork_trampoline+0x14
KDB: enter: panic
[ thread pid 12 tid 100028 ]
Stopped at kdb_enter+0x48: undefined f904411f
db>
Based on the "nvme" references, I expect this is tied to
handling the Optane 480 GiByte that is in the PCIe slot
and is the boot/only media for the machine doing the diff.
"db> dump" seems to have worked.
After the reboot, zpool scrub found no errors.
# uname -apKU
If you have the dump, I suggest starting to make sure that the act_tr array looks sane. Make
sure all the live pointers point to a sane looking tr. Make sure that tr is on the active list, etc
It will take a fair amount of driver reading, though, to see how we got here. I'd also check to
make sure that qpair->num_enttries > cpl.cid (0x3e in this case).
Okay. I got this while trying to test an odd diff -r over NFS
issue with the more recent software. So the two will potentially
compete for time.
As investigation will be exploratory for me, not familiar, I'll
probably publish periodic notes on things as I go along looking
at stuff.
. . .
#10 0xffff00000047900c in panic (
fmt=0x12 <error: Cannot access memory at address 0x12>)
at /usr/main-src/sys/kern/kern_shutdown.c:843
#11 0xffff0000002226b4 in nvme_qpair_process_completions (
at /usr/main-src/sys/dev/nvme/nvme_qpair.c:617
at /usr/main-src/sys/dev/nvme/nvme_qpair.c:938
#13 0xffff000000495bf8 in softclock_call_cc (c=0xffffa0000b0539a0,
at /usr/main-src/sys/kern/kern_timeout.c:696
#14 0xffff000000495fb0 in softclock (arg=0xffff000000de3500 <cc_cpu+768>)
at /usr/main-src/sys/kern/kern_timeout.c:816
#15 0xffff0000004356dc in intr_event_execute_handlers (p=<optimized out>,
ie=0xffffa000058bc700) at /usr/main-src/sys/kern/kern_intr.c:1168
#16 ithread_execute_handlers (p=<optimized out>, ie=0xffffa000058bc700)
at /usr/main-src/sys/kern/kern_intr.c:1181
at /usr/main-src/sys/kern/kern_intr.c:1269
#18 0xffff000000431f6c in fork_exit (
callout=0xffff000000435430 <ithread_loop>, arg=0xffffa000058aef60,
frame=0xffff0000eb7cc990) at /usr/main-src/sys/kern/kern_fork.c:1083
#19 <signal handler called>
So via kgdb . . .
(kgdb) up 11
617 KASSERT(0, ("received completion for unknown cmd"));
(kgdb) print/x cpl.cid
$4 = 0x7e
(kgdb) print/x qpair->num_entries
$5 = 0x100
qpair->act_tr = malloc_domainset(sizeof(struct nvme_tracker *) *
qpair->num_entries, M_NVME, DOMAINSET_PREF(qpair->domain),
M_ZERO | M_WAITOK);
(kgdb) print qpair->act_tr
$6 = (struct nvme_tracker **) 0xffffa00008725800
(kgdb) x/256g 0xffffa00008725800
0xffffa00008725800: 0x0000000000000000 0x0000000000000000
0xffffa00008725810: 0x0000000000000000 0x0000000000000000
. . .
0xffffa00008725fe0: 0x0000000000000000 0x0000000000000000
0xffffa00008725ff0: 0x0000000000000000 0x0000000000000000
It was all zeros (null pointers). No "live" pointers and, so,
no tr's to inspect.
As none of this is familiar context beyond general programming
concepts, it may be some time before I find anything else
potentially of interest to report. If you have other specific
things you would like me to look at, let me know.
(kgdb) print/x *qpair
$15 = {ctrlr = 0xffff0000fe154000, id = 0x3, domain = 0x0, cpu = 0x2, vector = 0x3, rid = 0x4, res = 0xffffa000086ded80, tag = 0xffffa0000877b780, num_entries = 0x100, num_trackers = 0x80,
sq_tdbl_off = 0x1018, cq_hdbl_off = 0x101c, phase = 0x1, sq_head = 0x1f, sq_tail = 0x20, cq_head = 0x20, num_cmds = 0x420, num_intr_handler_calls = 0xe66c, num_retries = 0x0, num_failures = 0x0,
cmd = 0xffff000100ebb000, cpl = 0xffff000100ebf000, dma_tag = 0xffffa0000b093e00, dma_tag_payload = 0xffffa000059ef000, queuemem_map = 0xffffa00005a07700, cmd_bus_addr = 0xacbb000,
cpl_bus_addr = 0xacbf000, free_tr = {tqh_first = 0xffffa0000b053a80, tqh_last = 0xffffa0000869da80}, outstanding_tr = {tqh_first = 0xffffa0000b053980, tqh_last = 0xffffa0000b053980}, queued_req = {
stqh_first = 0x0, stqh_last = 0xffffa000087243c8}, act_tr = 0xffffa00008725800, is_enabled = 0x1, lock = {lock_object = {lo_name = 0xffff00000090321f, lo_flags = 0x1030000, lo_data = 0x0,
lo_witness = 0xffffa0043fd96080}, mtx_lock = 0x0}}
Looks like I need to boot into the non-debug builds for the
other problem I'm testing for repeatability after a commit.
I've no figured out anything interesting so far.
But I've run into something that looks odd to me
(not that I've any evidence it is related to the
There is a use of atomic_store_rel_int(&qpair->cq_head, 0)
for which I do not find any matching atomic_load_acq_int
use (or other explicit _acq), so so there is no "synchronizes
with" status in the code to establish an ordering across
threads that involves the atomic_store_rel_int that I've
found, just implicit/default relaxed-load-operations. A grep
./dev/nvme/nvme_private.h: uint32_t cq_head;
./dev/nvme/nvme_sysctl.c: SYSCTL_ADD_UINT(ctrlr_ctx, que_list, OID_AUTO, "cq_head",
./dev/nvme/nvme_sysctl.c: CTLFLAG_RD, &qpair->cq_head, 0,
./dev/nvme/nvme_qpair.c: * below, but before we can reset cq_head to zero at 2. Also cope with
./dev/nvme/nvme_qpair.c: if (qpair->cq_head == qpair->num_entries) {
./dev/nvme/nvme_qpair.c: * Here we know that we need to zero cq_head and then negate
./dev/nvme/nvme_qpair.c: * the phase, which hasn't been assigned if cq_head isn't
./dev/nvme/nvme_qpair.c: qpair->cq_head = 0;
./dev/nvme/nvme_qpair.c: } else if (qpair->cq_head == 0) {
./dev/nvme/nvme_qpair.c: cpl = qpair->cpl[qpair->cq_head];
./dev/nvme/nvme_qpair.c: * qpair->cq_head at 1 below. Later, we re-enter this
./dev/nvme/nvme_qpair.c: * won't have updated cq_head. Rather than panic again,
./dev/nvme/nvme_qpair.c: nvme_dump_completion(&qpair->cpl[qpair->cq_head]);
./dev/nvme/nvme_qpair.c: if (++qpair->cq_head == qpair->num_entries) { /* 1 */
./dev/nvme/nvme_qpair.c: atomic_store_rel_int(&qpair->cq_head, 0); /* 2 */
./dev/nvme/nvme_qpair.c: qpair->cq_hdbl_off, qpair->cq_head);
./dev/nvme/nvme_qpair.c: qpair->sq_head = qpair->sq_tail = qpair->cq_head = 0;
(2 lines above the last has the atomic_store_rel_int use.)
atomic_thread_fence_rel use would have "synchronizes
with" based on ordinary loads reading something stored
after the atomic_thread_fence_rel. Such is documented
in "man atomic". But that is not what this code is doing.
"man atomic" does not mention ordinary loads getting such
a status by reading what an atomic_store_rel_int wrote.
It does reference the atomic_thread_fence_rel related
status for ordinary loads.
So I'm clueless about what is intended to be going on
relative to that "atomic_store_rel_int(&qpair->cq_head, 0)".
Overall, the code does not appear to me to match up with the
aarch64, powerpc64, or powerpc code generation requirements
to have any matching "synchronizes with" relationships.
(I'll not list machine instruction sequences here. I'd have
to look up the detailed sequences. But, as I remember, more
than a load is involved in the code sequences for the
acquire side on these types of processors. Nothing in the
source indicates to generate the additional code as far as
I can tell.)
Adding notes about the code generation, using fairly
weak sequences (these are for C++/C11 semantics as
I understand):

(Picking prefix-form for powerpc:)
powerpc Load Relaxed vs. Acquire: ld vs. ld;cmp;bc;isync
powerpc Fence: Acquire: lwsync
powerpc Store Relaxed vs. Release: st vs. "Fence: Release";st
powerpc Fence: Release: lwsync

Note: I am unsure how the above mixes with FreeBSD's
plain-load vs. release-fence rules for atomic, for
example. I've never seen a instruction level breakdowns
of what implements FreeBSD's specific rules for
powerpc64.

aarch64 Load Relaxed vs. Acquire: LDR vs. LDAR
aarch64 Fence: Acquire: DMB ISH LD
aarch64 Store Relaxed vs. Release: STR vs. STLR
aarch64 Fence: Release: DMB ISH

Note: Again I am unsure of the pain-load vs.
acquire-fence rule for FreeBSD. And, here,
similarly for the plain-store vs. release-fence
rule. I've never seen a instruction level breakdowns
of what implements FreeBSD's specific rules for
aarch64.

armv7 Load Relaxed vs. Acquire: ldr vs. ldr;"Fence: Acquire" (or ldr;teq;beq;isb)
armv7 Fence: Acquire: dmb ish
armv7 Store Relaxed vs. Release: str vs. "Fence: Release";str
armv7 Fence: Release: dmb ish

Note: The above fits with FreeBSD's plain-load vs.
release-fence and plain-stores vs. acquire-fence rules
because the fences are used to form Store-Release and
Load-Acquire.

(If I gather correctly FreeBSD does not support aarch32 for its
distinctions from armv7, for example the aarch32's alternative
SeqCst mapping that does not mix well with the armv7 mapping.
Still I list aarch32 for reference.)

aarch32 Load Relaxed vs. Acquire: LDR vs. LDA
aarch32 Fence: Acquire: DMB ISH LD
aarch32 Store Relaxed vs. Release: STR vs. STL
aarch32 Fence: Release: DMB ISH

Note: Again I would be unsure of the pain-load
vs. acquire-fence rule for FreeBSD. Similarly
for the plain-store vs. release-fence rule.

I got these code sequences from:

https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

I was unable to use the code sequences to figure out
the intended operation of the:

atomic_store_rel_int(&qpair->cq_head, 0)

absent any atomic_load_acq_int to match --and everything
else being based on just plain load/store for cq_head .

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Loading...