Discussion:
FYI for aarch64 main [14] running a mid March version: I ended up with [usb{usbus2}] stuck at (near) 100% cpu
Mark Millard via freebsd-current
2021-05-14 05:59:48 UTC
Permalink
Note: The context was using a non-debug main build
from mid-2021-Mar. (More details identified
later.)

The issue happend while attempting a:

# zfs send -R ***@for-copy | zfs recv -Fdv zpnew

where the drives involved in the command were:

zpold: a USB3 SSD, using /dev/da0p3
zpnew: an 480 GiByte Optane in the PCIe slot, using /dev/nda0p3

with:

# gpart show -pl
=> 40 468862048 da0 GPT (224G)
40 532480 da0p1 4C8GCA72EFI (260M)
532520 2008 - free - (1.0M)
534528 29360128 da0p2 4C8GCA72swp14 (14G)
29894656 4194304 - free - (2.0G)
34088960 33554432 da0p4 4C8GCA72swp16 (16G)
67643392 401217536 da0p3 4C8GCA72zfs (191G)
468860928 1160 - free - (580K)

=> 40 2000409184 ada0 GPT (954G)
40 409600 ada0p1 (null) (200M)
409640 1740636160 ada0p2 FBSDmacchroot (830G)
1741045800 58720256 ada0p3 FBSDmacchswp0 (28G)
1799766056 176160768 ada0p4 FBSDmacchswp1 (84G)
1975926824 24482400 - free - (12G)

=> 40 937703008 nda0 GPT (447G)
40 532480 nda0p1 CA72opt0EFI (260M)
532520 2008 - free - (1.0M)
534528 117440512 nda0p2 CA72opt0swp56 (56G)
117975040 16777216 - free - (8.0G)
134752256 134217728 nda0p4 CA72opt0swp64 (64G)
268969984 668731392 nda0p3 CA72opt0zfs (319G)
937701376 1672 - free - (836K)

The system running was that on /dev/ada0p2 (FBSDmacchroot,
which is UFS instead of ZFS).

The [usb{usbus2}] process eventually got stuck-busy, no
more I/O:

CPU 0: 0.0% user, 0.0% nice, 100% system, 0.0% interrupt, 0.0% idle
CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 3: 0.4% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.6% idle

PID USERNAME PRI NICE SIZE RES STATE C TIME CPU COMMAND
15 root -72 - 0B 262144B CPU0 0 8:51 99.95% [usb{usbus2}]

1295 root -8 0 20108Ki 8092Ki q->bq_ 2 0:04 0.00% zfs recv -Fdv zpnew{receive_writer_thre}
1295 root 48 0 20108Ki 8092Ki piperd 2 0:22 0.00% zfs recv -Fdv zpnew{zfs}
1294 root -8 0 17544Ki 7740Ki q->bq_ 2 0:01 0.00% zfs send -R ***@for-copy{send_reader_thread}
1294 root -8 0 17544Ki 7740Ki q->bq_ 0 0:00 0.00% zfs send -R ***@for-copy{send_merge_thread}
1294 root -8 0 17544Ki 7740Ki hdr->b 2 0:00 0.00% zfs send -R ***@for-copy{send_traverse_threa}
1294 root 52 0 17544Ki 7740Ki range- 3 0:20 0.00% zfs send -R ***@for-copy{zfs}

1036 root -8 - 0B 1488Ki t->zth 0 0:00 0.00% [zfskern{z_checkpoint_discar}]
1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_livelist_condense}]
1036 root -8 - 0B 1488Ki t->zth 2 0:00 0.00% [zfskern{z_livelist_destroy}]
1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_indirect_condense}]
1036 root -8 - 0B 1488Ki mmp->m 3 0:00 0.00% [zfskern{mmp_thread_enter}]
1036 root -8 - 0B 1488Ki tx->tx 1 0:00 0.00% [zfskern{txg_thread_enter}]
1036 root -8 - 0B 1488Ki tx->tx 2 0:00 0.00% [zfskern{txg_thread_enter}]

I was unable to ^c or ^z the process where I
typed the command. I eventually stopped the
system with "shutdown -p now" from a ssh
session (that had already been in place).

When I retried after rebooting and scrubbing (no
problems found), the problem did not repeat.

I do not have more information nor a way to repeat
the problem on demand, unfortunately.

Details of the vintage of the system software and
such:

# ~/fbsd-based-on-what-freebsd-main.sh
FreeBSD FBSDmacch 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n245445-def0058cc690 GENERIC-NODBG arm64 aarch64 1400005 1400005
def0058cc690 (HEAD -> mm-src) mm-src snapshot for mm's patched build in git context.
merge-base: 7381bbee29df959e88ec59866cf2878263e7f3b2
merge-base: CommitDate: 2021-03-12 20:29:42 +0000
7381bbee29df (freebsd/main, freebsd/HEAD, pure-src, main) cam: Run all XPT_ASYNC ccbs in a dedicated thread
n245444 (--first-parent --count for merge-base)

The system was a MACCHIATObin Double Shot.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Greg V
2021-05-14 09:06:17 UTC
Permalink
Post by Mark Millard via freebsd-current
The [usb{usbus2}] process eventually got stuck-busy, no
The system was a MACCHIATObin Double Shot.
I've noticed USB issues on the mcbin too.
In my experience uaudio was the most likely thing to freeze: sometimes when playing audio the whole USB stack would hang, IIRC only unplugging and reconnecting the hub would let USB I/O continue.

Looks like the XHCI hardware might not be fully stable here?? but I do wonder if we could possibly lack some recovery mechanisms for this situation that other OSes have.
Rodney W. Grimes
2021-05-14 12:52:10 UTC
Permalink
Post by Mark Millard via freebsd-current
Note: The context was using a non-debug main build
from mid-2021-Mar. (More details identified
later.)
zpold: a USB3 SSD, using /dev/da0p3
zpnew: an 480 GiByte Optane in the PCIe slot, using /dev/nda0p3
# gpart show -pl
=> 40 468862048 da0 GPT (224G)
40 532480 da0p1 4C8GCA72EFI (260M)
532520 2008 - free - (1.0M)
534528 29360128 da0p2 4C8GCA72swp14 (14G)
29894656 4194304 - free - (2.0G)
34088960 33554432 da0p4 4C8GCA72swp16 (16G)
67643392 401217536 da0p3 4C8GCA72zfs (191G)
468860928 1160 - free - (580K)
=> 40 2000409184 ada0 GPT (954G)
40 409600 ada0p1 (null) (200M)
409640 1740636160 ada0p2 FBSDmacchroot (830G)
1741045800 58720256 ada0p3 FBSDmacchswp0 (28G)
1799766056 176160768 ada0p4 FBSDmacchswp1 (84G)
1975926824 24482400 - free - (12G)
=> 40 937703008 nda0 GPT (447G)
40 532480 nda0p1 CA72opt0EFI (260M)
532520 2008 - free - (1.0M)
534528 117440512 nda0p2 CA72opt0swp56 (56G)
117975040 16777216 - free - (8.0G)
134752256 134217728 nda0p4 CA72opt0swp64 (64G)
268969984 668731392 nda0p3 CA72opt0zfs (319G)
937701376 1672 - free - (836K)
The system running was that on /dev/ada0p2 (FBSDmacchroot,
which is UFS instead of ZFS).
The [usb{usbus2}] process eventually got stuck-busy, no
CPU 0: 0.0% user, 0.0% nice, 100% system, 0.0% interrupt, 0.0% idle
CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 3: 0.4% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.6% idle
PID USERNAME PRI NICE SIZE RES STATE C TIME CPU COMMAND
15 root -72 - 0B 262144B CPU0 0 8:51 99.95% [usb{usbus2}]
1295 root -8 0 20108Ki 8092Ki q->bq_ 2 0:04 0.00% zfs recv -Fdv zpnew{receive_writer_thre}
1295 root 48 0 20108Ki 8092Ki piperd 2 0:22 0.00% zfs recv -Fdv zpnew{zfs}
1036 root -8 - 0B 1488Ki t->zth 0 0:00 0.00% [zfskern{z_checkpoint_discar}]
1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_livelist_condense}]
1036 root -8 - 0B 1488Ki t->zth 2 0:00 0.00% [zfskern{z_livelist_destroy}]
1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_indirect_condense}]
1036 root -8 - 0B 1488Ki mmp->m 3 0:00 0.00% [zfskern{mmp_thread_enter}]
1036 root -8 - 0B 1488Ki tx->tx 1 0:00 0.00% [zfskern{txg_thread_enter}]
1036 root -8 - 0B 1488Ki tx->tx 2 0:00 0.00% [zfskern{txg_thread_enter}]
I was unable to ^c or ^z the process where I
typed the command. I eventually stopped the
system with "shutdown -p now" from a ssh
session (that had already been in place).
Should this occur again before doing the shutdown run a
zpool status &
I have gotten in this state when the recv pool was a usb device
and for some reason it had a timeout and gone offline. The clue
this occured are in dmesg, and zpool status.

Unplug/plug the USB device, check dmesg that it came online,
and do a zpool clear.
Post by Mark Millard via freebsd-current
When I retried after rebooting and scrubbing (no
problems found), the problem did not repeat.
I do not have more information nor a way to repeat
the problem on demand, unfortunately.
Details of the vintage of the system software and
# ~/fbsd-based-on-what-freebsd-main.sh
FreeBSD FBSDmacch 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n245445-def0058cc690 GENERIC-NODBG arm64 aarch64 1400005 1400005
def0058cc690 (HEAD -> mm-src) mm-src snapshot for mm's patched build in git context.
merge-base: 7381bbee29df959e88ec59866cf2878263e7f3b2
merge-base: CommitDate: 2021-03-12 20:29:42 +0000
7381bbee29df (freebsd/main, freebsd/HEAD, pure-src, main) cam: Run all XPT_ASYNC ccbs in a dedicated thread
n245444 (--first-parent --count for merge-base)
The system was a MACCHIATObin Double Shot.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-current
--
Rod Grimes ***@freebsd.org
Mark Millard via freebsd-current
2021-05-14 17:07:13 UTC
Permalink
Post by Rodney W. Grimes
Post by Mark Millard via freebsd-current
Note: The context was using a non-debug main build
from mid-2021-Mar. (More details identified
later.)
zpold: a USB3 SSD, using /dev/da0p3
zpnew: an 480 GiByte Optane in the PCIe slot, using /dev/nda0p3
# gpart show -pl
=> 40 468862048 da0 GPT (224G)
40 532480 da0p1 4C8GCA72EFI (260M)
532520 2008 - free - (1.0M)
534528 29360128 da0p2 4C8GCA72swp14 (14G)
29894656 4194304 - free - (2.0G)
34088960 33554432 da0p4 4C8GCA72swp16 (16G)
67643392 401217536 da0p3 4C8GCA72zfs (191G)
468860928 1160 - free - (580K)
=> 40 2000409184 ada0 GPT (954G)
40 409600 ada0p1 (null) (200M)
409640 1740636160 ada0p2 FBSDmacchroot (830G)
1741045800 58720256 ada0p3 FBSDmacchswp0 (28G)
1799766056 176160768 ada0p4 FBSDmacchswp1 (84G)
1975926824 24482400 - free - (12G)
=> 40 937703008 nda0 GPT (447G)
40 532480 nda0p1 CA72opt0EFI (260M)
532520 2008 - free - (1.0M)
534528 117440512 nda0p2 CA72opt0swp56 (56G)
117975040 16777216 - free - (8.0G)
134752256 134217728 nda0p4 CA72opt0swp64 (64G)
268969984 668731392 nda0p3 CA72opt0zfs (319G)
937701376 1672 - free - (836K)
The system running was that on /dev/ada0p2 (FBSDmacchroot,
which is UFS instead of ZFS).
The [usb{usbus2}] process eventually got stuck-busy, no
CPU 0: 0.0% user, 0.0% nice, 100% system, 0.0% interrupt, 0.0% idle
CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 3: 0.4% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.6% idle
PID USERNAME PRI NICE SIZE RES STATE C TIME CPU COMMAND
15 root -72 - 0B 262144B CPU0 0 8:51 99.95% [usb{usbus2}]
1295 root -8 0 20108Ki 8092Ki q->bq_ 2 0:04 0.00% zfs recv -Fdv zpnew{receive_writer_thre}
1295 root 48 0 20108Ki 8092Ki piperd 2 0:22 0.00% zfs recv -Fdv zpnew{zfs}
1036 root -8 - 0B 1488Ki t->zth 0 0:00 0.00% [zfskern{z_checkpoint_discar}]
1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_livelist_condense}]
1036 root -8 - 0B 1488Ki t->zth 2 0:00 0.00% [zfskern{z_livelist_destroy}]
1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_indirect_condense}]
1036 root -8 - 0B 1488Ki mmp->m 3 0:00 0.00% [zfskern{mmp_thread_enter}]
1036 root -8 - 0B 1488Ki tx->tx 1 0:00 0.00% [zfskern{txg_thread_enter}]
1036 root -8 - 0B 1488Ki tx->tx 2 0:00 0.00% [zfskern{txg_thread_enter}]
I was unable to ^c or ^z the process where I
typed the command. I eventually stopped the
system with "shutdown -p now" from a ssh
session (that had already been in place).
Should this occur again before doing the shutdown run a
zpool status &
I have gotten in this state when the recv pool was a usb device
The USB device had the send pool in my example.
Post by Rodney W. Grimes
and for some reason it had a timeout and gone offline.
No messages about timeouts or other such were made.
Post by Rodney W. Grimes
The clue
this occured are in dmesg, and zpool status.
No console, dmesg -a, or /var/log/messages output were
generated. (And the system was running from a SATA SSD
that was operating well.)

For reference, the USB EtherNet device that was in a
USB2 port continued to operate just fine, allowing the
use of existing ssh sessions that were displaying
gstat -spod and top -Samio -ototal until I started
looking at the problem. (I did not try making a new
ssh session).

I did not do the "zpool status" so I can not report
about it.
Post by Rodney W. Grimes
Unplug/plug the USB device, check dmesg that it came online,
and do a zpool clear.
Okay. I normally avoid unplugging USB storage media
if the system overall does not hang up: hopes of
a clean shutdown leaving things better.

The system did appear to shutdown to completion.
Post by Rodney W. Grimes
Post by Mark Millard via freebsd-current
When I retried after rebooting and scrubbing (no
problems found), the problem did not repeat.
I do not have more information nor a way to repeat
the problem on demand, unfortunately.
Details of the vintage of the system software and
# ~/fbsd-based-on-what-freebsd-main.sh
FreeBSD FBSDmacch 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n245445-def0058cc690 GENERIC-NODBG arm64 aarch64 1400005 1400005
def0058cc690 (HEAD -> mm-src) mm-src snapshot for mm's patched build in git context.
merge-base: 7381bbee29df959e88ec59866cf2878263e7f3b2
merge-base: CommitDate: 2021-03-12 20:29:42 +0000
7381bbee29df (freebsd/main, freebsd/HEAD, pure-src, main) cam: Run all XPT_ASYNC ccbs in a dedicated thread
n245444 (--first-parent --count for merge-base)
The system was a MACCHIATObin Double Shot.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Rodney W. Grimes
2021-05-14 20:48:48 UTC
Permalink
Post by Mark Millard via freebsd-current
Post by Rodney W. Grimes
Post by Mark Millard via freebsd-current
Note: The context was using a non-debug main build
from mid-2021-Mar. (More details identified
later.)
zpold: a USB3 SSD, using /dev/da0p3
zpnew: an 480 GiByte Optane in the PCIe slot, using /dev/nda0p3
# gpart show -pl
=> 40 468862048 da0 GPT (224G)
40 532480 da0p1 4C8GCA72EFI (260M)
532520 2008 - free - (1.0M)
534528 29360128 da0p2 4C8GCA72swp14 (14G)
29894656 4194304 - free - (2.0G)
34088960 33554432 da0p4 4C8GCA72swp16 (16G)
67643392 401217536 da0p3 4C8GCA72zfs (191G)
468860928 1160 - free - (580K)
=> 40 2000409184 ada0 GPT (954G)
40 409600 ada0p1 (null) (200M)
409640 1740636160 ada0p2 FBSDmacchroot (830G)
1741045800 58720256 ada0p3 FBSDmacchswp0 (28G)
1799766056 176160768 ada0p4 FBSDmacchswp1 (84G)
1975926824 24482400 - free - (12G)
=> 40 937703008 nda0 GPT (447G)
40 532480 nda0p1 CA72opt0EFI (260M)
532520 2008 - free - (1.0M)
534528 117440512 nda0p2 CA72opt0swp56 (56G)
117975040 16777216 - free - (8.0G)
134752256 134217728 nda0p4 CA72opt0swp64 (64G)
268969984 668731392 nda0p3 CA72opt0zfs (319G)
937701376 1672 - free - (836K)
The system running was that on /dev/ada0p2 (FBSDmacchroot,
which is UFS instead of ZFS).
The [usb{usbus2}] process eventually got stuck-busy, no
CPU 0: 0.0% user, 0.0% nice, 100% system, 0.0% interrupt, 0.0% idle
CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 3: 0.4% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.6% idle
PID USERNAME PRI NICE SIZE RES STATE C TIME CPU COMMAND
15 root -72 - 0B 262144B CPU0 0 8:51 99.95% [usb{usbus2}]
1295 root -8 0 20108Ki 8092Ki q->bq_ 2 0:04 0.00% zfs recv -Fdv zpnew{receive_writer_thre}
1295 root 48 0 20108Ki 8092Ki piperd 2 0:22 0.00% zfs recv -Fdv zpnew{zfs}
1036 root -8 - 0B 1488Ki t->zth 0 0:00 0.00% [zfskern{z_checkpoint_discar}]
1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_livelist_condense}]
1036 root -8 - 0B 1488Ki t->zth 2 0:00 0.00% [zfskern{z_livelist_destroy}]
1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_indirect_condense}]
1036 root -8 - 0B 1488Ki mmp->m 3 0:00 0.00% [zfskern{mmp_thread_enter}]
1036 root -8 - 0B 1488Ki tx->tx 1 0:00 0.00% [zfskern{txg_thread_enter}]
1036 root -8 - 0B 1488Ki tx->tx 2 0:00 0.00% [zfskern{txg_thread_enter}]
I was unable to ^c or ^z the process where I
typed the command. I eventually stopped the
system with "shutdown -p now" from a ssh
session (that had already been in place).
Should this occur again before doing the shutdown run a
zpool status &
I have gotten in this state when the recv pool was a usb device
The USB device had the send pool in my example.
Post by Rodney W. Grimes
and for some reason it had a timeout and gone offline.
No messages about timeouts or other such were made.
Post by Rodney W. Grimes
The clue
this occured are in dmesg, and zpool status.
No console, dmesg -a, or /var/log/messages output were
generated. (And the system was running from a SATA SSD
that was operating well.)
For reference, the USB EtherNet device that was in a
USB2 port continued to operate just fine, allowing the
use of existing ssh sessions that were displaying
gstat -spod and top -Samio -ototal until I started
looking at the problem. (I did not try making a new
ssh session).
I did not do the "zpool status" so I can not report
about it.
Post by Rodney W. Grimes
Unplug/plug the USB device, check dmesg that it came online,
and do a zpool clear.
Okay. I normally avoid unplugging USB storage media
if the system overall does not hang up: hopes of
a clean shutdown leaving things better.
Do the zpool status, and only if that indicates a device offline
or other problem would you proceed to do the unplug/plug, as at
that point zfs has stopped doing anything to the device and
your shutdown wont do anything as far as zfs for that pool anyway.
Post by Mark Millard via freebsd-current
The system did appear to shutdown to completion.
Yes, you can shutdown a system with a zpool in failed state.
Post by Mark Millard via freebsd-current
Post by Rodney W. Grimes
Post by Mark Millard via freebsd-current
When I retried after rebooting and scrubbing (no
problems found), the problem did not repeat.
I do not have more information nor a way to repeat
the problem on demand, unfortunately.
Details of the vintage of the system software and
# ~/fbsd-based-on-what-freebsd-main.sh
FreeBSD FBSDmacch 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n245445-def0058cc690 GENERIC-NODBG arm64 aarch64 1400005 1400005
def0058cc690 (HEAD -> mm-src) mm-src snapshot for mm's patched build in git context.
merge-base: 7381bbee29df959e88ec59866cf2878263e7f3b2
merge-base: CommitDate: 2021-03-12 20:29:42 +0000
7381bbee29df (freebsd/main, freebsd/HEAD, pure-src, main) cam: Run all XPT_ASYNC ccbs in a dedicated thread
n245444 (--first-parent --count for merge-base)
The system was a MACCHIATObin Double Shot.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-current
--
Rod Grimes ***@freebsd.org
Mark Millard via freebsd-current
2021-05-14 21:51:27 UTC
Permalink
Post by Rodney W. Grimes
Post by Mark Millard via freebsd-current
Post by Rodney W. Grimes
Post by Mark Millard via freebsd-current
Note: The context was using a non-debug main build
from mid-2021-Mar. (More details identified
later.)
zpold: a USB3 SSD, using /dev/da0p3
zpnew: an 480 GiByte Optane in the PCIe slot, using /dev/nda0p3
# gpart show -pl
=> 40 468862048 da0 GPT (224G)
40 532480 da0p1 4C8GCA72EFI (260M)
532520 2008 - free - (1.0M)
534528 29360128 da0p2 4C8GCA72swp14 (14G)
29894656 4194304 - free - (2.0G)
34088960 33554432 da0p4 4C8GCA72swp16 (16G)
67643392 401217536 da0p3 4C8GCA72zfs (191G)
468860928 1160 - free - (580K)
=> 40 2000409184 ada0 GPT (954G)
40 409600 ada0p1 (null) (200M)
409640 1740636160 ada0p2 FBSDmacchroot (830G)
1741045800 58720256 ada0p3 FBSDmacchswp0 (28G)
1799766056 176160768 ada0p4 FBSDmacchswp1 (84G)
1975926824 24482400 - free - (12G)
=> 40 937703008 nda0 GPT (447G)
40 532480 nda0p1 CA72opt0EFI (260M)
532520 2008 - free - (1.0M)
534528 117440512 nda0p2 CA72opt0swp56 (56G)
117975040 16777216 - free - (8.0G)
134752256 134217728 nda0p4 CA72opt0swp64 (64G)
268969984 668731392 nda0p3 CA72opt0zfs (319G)
937701376 1672 - free - (836K)
The system running was that on /dev/ada0p2 (FBSDmacchroot,
which is UFS instead of ZFS).
The [usb{usbus2}] process eventually got stuck-busy, no
CPU 0: 0.0% user, 0.0% nice, 100% system, 0.0% interrupt, 0.0% idle
CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 3: 0.4% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.6% idle
PID USERNAME PRI NICE SIZE RES STATE C TIME CPU COMMAND
15 root -72 - 0B 262144B CPU0 0 8:51 99.95% [usb{usbus2}]
1295 root -8 0 20108Ki 8092Ki q->bq_ 2 0:04 0.00% zfs recv -Fdv zpnew{receive_writer_thre}
1295 root 48 0 20108Ki 8092Ki piperd 2 0:22 0.00% zfs recv -Fdv zpnew{zfs}
1036 root -8 - 0B 1488Ki t->zth 0 0:00 0.00% [zfskern{z_checkpoint_discar}]
1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_livelist_condense}]
1036 root -8 - 0B 1488Ki t->zth 2 0:00 0.00% [zfskern{z_livelist_destroy}]
1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_indirect_condense}]
1036 root -8 - 0B 1488Ki mmp->m 3 0:00 0.00% [zfskern{mmp_thread_enter}]
1036 root -8 - 0B 1488Ki tx->tx 1 0:00 0.00% [zfskern{txg_thread_enter}]
1036 root -8 - 0B 1488Ki tx->tx 2 0:00 0.00% [zfskern{txg_thread_enter}]
I was unable to ^c or ^z the process where I
typed the command. I eventually stopped the
system with "shutdown -p now" from a ssh
session (that had already been in place).
Should this occur again before doing the shutdown run a
zpool status &
I have gotten in this state when the recv pool was a usb device
The USB device had the send pool in my example.
Post by Rodney W. Grimes
and for some reason it had a timeout and gone offline.
No messages about timeouts or other such were made.
Post by Rodney W. Grimes
The clue
this occured are in dmesg, and zpool status.
No console, dmesg -a, or /var/log/messages output were
generated. (And the system was running from a SATA SSD
that was operating well.)
For reference, the USB EtherNet device that was in a
USB2 port continued to operate just fine, allowing the
use of existing ssh sessions that were displaying
gstat -spod and top -Samio -ototal until I started
looking at the problem. (I did not try making a new
ssh session).
I did not do the "zpool status" so I can not report
about it.
Post by Rodney W. Grimes
Unplug/plug the USB device, check dmesg that it came online,
and do a zpool clear.
Okay. I normally avoid unplugging USB storage media
if the system overall does not hang up: hopes of
a clean shutdown leaving things better.
Do the zpool status, and only if that indicates a device offline
or other problem would you proceed to do the unplug/plug, as at
that point zfs has stopped doing anything to the device and
your shutdown wont do anything as far as zfs for that pool anyway.
Post by Mark Millard via freebsd-current
The system did appear to shutdown to completion.
Yes, you can shutdown a system with a zpool in failed state.
I still have the console output available and I was
wrong: the shutdown hung up and apparently I cut
power before "All buffers synced". All it got to
was:

Stopping cron.
Waiting for PIDS: 858.
Stopping sshd.
Waiting for PIDS: 852.
fstab: /etc/fstab:6: Inappropriate file type or format
Stopping ntpd.
Waiting for PIDS: 807.
Stopping nfsd.
Waiting for PIDS: 781 782.
Stopping mountd.
Waiting for PIDS: 779.
Stopping rpcbind.
Waiting for PIDS: 741.
Stopping devd.
Waiting for PIDS: 431.
Writing entropy file: .
Writing early boot entropy file: .
.
Terminated
May 13 20:16:26 FBSDmacch syslogd: exiting on signal 15

It did not get to any of the usual sort of (from a different
shutdown):

Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 0 0 0 0 0 0 0 0 0 0 done
Waiting (max 60 seconds) for system thread `bufdaemon' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-3' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-4' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-5' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-2' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-6' to stop... done
All buffers synced.
Uptime: 8m12s
Post by Rodney W. Grimes
Post by Mark Millard via freebsd-current
Post by Rodney W. Grimes
Post by Mark Millard via freebsd-current
When I retried after rebooting and scrubbing (no
problems found), the problem did not repeat.
I do not have more information nor a way to repeat
the problem on demand, unfortunately.
Details of the vintage of the system software and
# ~/fbsd-based-on-what-freebsd-main.sh
FreeBSD FBSDmacch 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n245445-def0058cc690 GENERIC-NODBG arm64 aarch64 1400005 1400005
def0058cc690 (HEAD -> mm-src) mm-src snapshot for mm's patched build in git context.
merge-base: 7381bbee29df959e88ec59866cf2878263e7f3b2
merge-base: CommitDate: 2021-03-12 20:29:42 +0000
7381bbee29df (freebsd/main, freebsd/HEAD, pure-src, main) cam: Run all XPT_ASYNC ccbs in a dedicated thread
n245444 (--first-parent --count for merge-base)
The system was a MACCHIATObin Double Shot.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Loading...