Discussion:
CFT: TRIM Consolodation on UFS/FFS filesystems
Kirk McKusick
2018-08-20 19:40:56 UTC
Permalink
I have recently added TRIM consolodation support for the UFS/FFS
filesystem. This feature consolodates large numbers of TRIM commands
into a much smaller number of commands covering larger blocks of
disk space. Best described by the commit message:

Author: mckusick
Date: Sun Aug 19 16:56:42 2018
New Revision: 338056
URL: https://svnweb.freebsd.org/changeset/base/338056

Log:
Add consolodation of TRIM / BIO_DELETE commands to the UFS/FFS filesystem.

When deleting files on filesystems that are stored on flash-memory
(solid-state) disk drives, the filesystem notifies the underlying
disk of the blocks that it is no longer using. The notification
allows the drive to avoid saving these blocks when it needs to
flash (zero out) one of its flash pages. These notifications of
no-longer-being-used blocks are referred to as TRIM notifications.
In FreeBSD these TRIM notifications are sent from the filesystem
to the drive using the BIO_DELETE command.

Until now, the filesystem would send a separate message to the drive
for each block of the file that was deleted. Each Gigabyte of file
size resulted in over 3000 TRIM messages being sent to the drive.
This burst of messages can overwhelm the drive's task queue causing
multiple second delays for read and write requests.

This implementation collects runs of contiguous blocks in the file
and then consolodates them into a single BIO_DELETE command to the
drive. The BIO_DELETE command describes the run of blocks as a
single large block being deleted. Each Gigabyte of file size can
result in as few as two BIO_DELETE commands and is typically less
than ten. Though these larger BIO_DELETE commands take longer to
run, they do not clog the drive task queue, so read and write
commands can intersperse effectively with them.

Though this new feature has been throughly reviewed and tested, it
is being added disabled by default so as to minimize the possibility
of disrupting the upcoming 12.0 release. It can be enabled by running
``sysctl vfs.ffs.dotrimcons=1''. Users are encouraged to test it.
If no problems arise, we will consider requesting that it be enabled
by default for 12.0.

Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix

This support is off by default, but I am hoping that I can get enough
testing to ensure that it (a) works, and (b) is helpful that it will
be reasonable to have it turned on by default in 12.0. The cutoff for
turning it on by default in 12.0 is September 19th. So I am requesting
your testing feedback in the near-term. Please let me know if you have
managed to use it successfully (or not) and also if it provided any
performance difference (good or bad).

To enable TRIM consolodation either use `sysctl vfs.ffs.dotrimcons=1'
or just set the `dotrimcons' variable in sys/ufs/ffs/ffs_alloc.c to 1.

Everything you need to test TRIM consolodation is obtained by setting
the above sysctl. However, if you want to collect statistics on how
effective the TRIM consolodation is working, the attached diff will
allow you to easily get statitics on how the TRIM is going. Compile your
kernel and the mount command. Note that if you do not do a buildworld,
you will need to copy /sys/sys/mount.h to /usr/include/sys/mount.h to
get the patched mount command to compile. Then run `mount -v'
(or `mount -v | grep /mnt' to get just the statistics for /mnt).

Removing a 30Mb file without TRIM consolodation:
/dev/md0 on /mnt (ufs, local, writes: sync 10 async 482, reads: sync 7 async 0, fsid d43f795b6a7d34fb, TRIM: total 952 total blocks 7616)

While removing the same file with TRIM consolodation:
/dev/md0 on /mnt (ufs, local, writes: sync 10 async 482, reads: sync 7 async 0, fsid d43f795b6a7d34fb, TRIM: total 3 total blocks 7616)

It also tracks pending blocks and pending files. These numbers are only
printed out when they are non-zero. Here is an example running with soft
updates right after a file has been rm'ed, but its blocks not yet released:
/dev/md0 on /mnt (ufs, local, soft-updates, writes: sync 2 async 251, reads: sync 5 async 0, fsid 303f795b1be0c459, pending blocks 7616, pending files 1)

Finally it tracks inflight BIO_DELETEs and total blocks represented by
those inflight BIO_DELETEs. These numbers are also only printed out when
they are non-zero. These statistics let you see how much of a backlog
of BIO_DELETEs you have backed up at/in the disk drive and you can track
how quickly they drain.

Kirk McKusick
Kirk McKusick
2018-08-20 19:59:03 UTC
Permalink
From: Kirk McKusick <***@mckusick.com>
To: FreeBSD Current <freebsd-***@FreeBSD.org>,
FreeBSD Filesystems <freebsd-***@FreeBSD.org>
Subject: CFT: TRIM Consolodation on UFS/FFS filesystems
Date: Mon, 20 Aug 2018 12:40:56 -0700

Oops, forgot that attachments get stripped. Below are the diffs for
gathering statistics. Sorry to those of you on Gmail for whom they
will be mangled.

Kirk McKusick

=-=-=

Index: sbin/mount/mount.c
===================================================================
--- sbin/mount/mount.c (revision 338054)
+++ sbin/mount/mount.c (working copy)
@@ -686,6 +686,18 @@ prmount(struct statfs *sfp)
for (i = 0; i < sizeof(sfp->f_fsid); i++)
printf("%02x", ((u_char *)&sfp->f_fsid)[i]);
}
+ if (sfp->f_trim_total != 0 || sfp->f_trim_total_blks != 0)
+ (void)printf(", TRIM: total %ju total blocks %ju",
+ (uintmax_t)sfp->f_trim_total,
+ (uintmax_t)sfp->f_trim_total_blks);
+ if (sfp->f_trim_inflight != 0 || sfp->f_trim_inflight_blks != 0)
+ (void)printf(", TRIM: inflight %ju inflight blocks %ju",
+ (uintmax_t)sfp->f_trim_inflight,
+ (uintmax_t)sfp->f_trim_inflight_blks);
+ if (sfp->f_pendingblks != 0 || sfp->f_pendingfiles != 0)
+ (void)printf(", pending blocks %ju, pending files %ju",
+ (uintmax_t)sfp->f_pendingblks,
+ (uintmax_t)sfp->f_pendingfiles);
}
(void)printf(")\n");
}
Index: sys/sys/mount.h
===================================================================
--- sys/sys/mount.h (revision 338054)
+++ sys/sys/mount.h (working copy)
@@ -85,7 +85,13 @@ struct statfs {
uint64_t f_asyncwrites; /* count of async writes since mount */
uint64_t f_syncreads; /* count of sync reads since mount */
uint64_t f_asyncreads; /* count of async reads since mount */
- uint64_t f_spare[10]; /* unused spare */
+ uint64_t f_trim_total; /* count of TRIM ops since mount */
+ uint64_t f_trim_total_blks; /* count of TRIM blocks since mount */
+ uint64_t f_trim_inflight; /* count of TRIM ops in progress */
+ uint64_t f_trim_inflight_blks; /* count of TRIM blocks in progress */
+ int64_t f_pendingblks; /* pending free blocks */
+ int64_t f_pendingfiles; /* pending free nodes */
+ uint64_t f_spare[4]; /* unused spare */
uint32_t f_namemax; /* maximum filename length */
uid_t f_owner; /* user that mounted the filesystem */
fsid_t f_fsid; /* filesystem id */
Index: sys/ufs/ffs/ffs_vfsops.c
===================================================================
--- sys/ufs/ffs/ffs_vfsops.c (revision 338081)
+++ sys/ufs/ffs/ffs_vfsops.c (working copy)
@@ -1398,7 +1398,13 @@ ffs_statfs(mp, sbp)
sbp->f_bsize = fs->fs_fsize;
sbp->f_iosize = fs->fs_bsize;
sbp->f_blocks = fs->fs_dsize;
+ sbp->f_pendingblks = dbtofsb(fs, fs->fs_pendingblocks);
+ sbp->f_pendingfiles = fs->fs_pendinginodes;
UFS_LOCK(ump);
+ sbp->f_trim_total = ump->um_trim_total;
+ sbp->f_trim_total_blks = ump->um_trim_total_blks;
+ sbp->f_trim_inflight = ump->um_trim_inflight;
+ sbp->f_trim_inflight_blks = ump->um_trim_inflight_blks;
sbp->f_bfree = fs->fs_cstotal.cs_nbfree * fs->fs_frag +
fs->fs_cstotal.cs_nffree + dbtofsb(fs, fs->fs_pendingblocks);
sbp->f_bavail = freespace(fs, fs->fs_minfree) +
bob prohaska
2018-08-22 00:48:43 UTC
Permalink
Post by Kirk McKusick
I have recently added TRIM consolodation support for the UFS/FFS
filesystem. This feature consolodates large numbers of TRIM commands
into a much smaller number of commands covering larger blocks of
Author: mckusick
Date: Sun Aug 19 16:56:42 2018
New Revision: 338056
URL: https://svnweb.freebsd.org/changeset/base/338056
Add consolodation of TRIM / BIO_DELETE commands to the UFS/FFS filesystem.
When deleting files on filesystems that are stored on flash-memory
(solid-state) disk drives, the filesystem notifies the underlying
disk of the blocks that it is no longer using. The notification
allows the drive to avoid saving these blocks when it needs to
flash (zero out) one of its flash pages. These notifications of
no-longer-being-used blocks are referred to as TRIM notifications.
In FreeBSD these TRIM notifications are sent from the filesystem
to the drive using the BIO_DELETE command.
Until now, the filesystem would send a separate message to the drive
for each block of the file that was deleted. Each Gigabyte of file
size resulted in over 3000 TRIM messages being sent to the drive.
This burst of messages can overwhelm the drive's task queue causing
multiple second delays for read and write requests.
This implementation collects runs of contiguous blocks in the file
and then consolodates them into a single BIO_DELETE command to the
drive. The BIO_DELETE command describes the run of blocks as a
single large block being deleted. Each Gigabyte of file size can
result in as few as two BIO_DELETE commands and is typically less
than ten. Though these larger BIO_DELETE commands take longer to
run, they do not clog the drive task queue, so read and write
commands can intersperse effectively with them.
Though this new feature has been throughly reviewed and tested, it
is being added disabled by default so as to minimize the possibility
of disrupting the upcoming 12.0 release. It can be enabled by running
``sysctl vfs.ffs.dotrimcons=1''. Users are encouraged to test it.
If no problems arise, we will consider requesting that it be enabled
by default for 12.0.
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
This support is off by default, but I am hoping that I can get enough
testing to ensure that it (a) works, and (b) is helpful that it will
be reasonable to have it turned on by default in 12.0. The cutoff for
turning it on by default in 12.0 is September 19th. So I am requesting
your testing feedback in the near-term. Please let me know if you have
managed to use it successfully (or not) and also if it provided any
performance difference (good or bad).
To enable TRIM consolodation either use `sysctl vfs.ffs.dotrimcons=1'
or just set the `dotrimcons' variable in sys/ufs/ffs/ffs_alloc.c to 1.
Will the new feature be active on a Raspberry Pi 3 using flash
on microSD and USB for file systems and swap?

Can the feature be turned on using one of the conf files in /etc?


According to Sandisk,
"All microSD or USB drives are flash memory and does support the TRIM command, however,
you will not notice any difference after running TRIM command on memory cards or USB
drives. TRIM command is basically used for SSD and Hard drives."

The "you will not notice any difference...." qualification makes me slightly uncertain
the reply was well-informed, but if there's any hope of success I'd like to try it.
From time to time there seem to be traffic jams among flash devices on the RPI3, it
would a pleasant surprise if this feature helps.

Thanks for reading!

bob prohaska
Mark Millard
2018-08-22 01:47:19 UTC
Permalink
bob prohaska fbsd at www.zefox.net wrote on
Post by bob prohaska
Post by Kirk McKusick
. . .
To enable TRIM consolodation either use `sysctl vfs.ffs.dotrimcons=1'
or just set the `dotrimcons' variable in sys/ufs/ffs/ffs_alloc.c to 1.
Will the new feature be active on a Raspberry Pi 3 using flash
on microSD and USB for file systems and swap?
Even if a USB device contains appropriate storage in it, that does
not mean that the USB protocol in use has a way to request the
operation. (Similarly for other multiple stages of translation
than USB protocol being involved.)

For FreeBSD, UFS and ZFS have support if the requests can be sent
through all the stages. Swap partitions do not have support even
if the device does through all the stages.

(See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206048 for
why I do not otherwise mention swap files.)

RPI3's use (some subset of?) USB 2.0 as I remember. I'm not aware
of the protocol supporting such. (I'm no expert, however.) Thus,
UFS and ZFS end up unable to do TRIM for such contexts as I
understand things.
Post by bob prohaska
Can the feature be turned on using one of the conf files in /etc?
At least for UFS there are commands for configuration, such as
tunefs and newfs that include control of such points. I do not
remember for ZFS.

As I remember if you enable it on UFS but it actually can not
do it for how the device is connected, FreeBSD reports the
issue at mount or some such.

I've used a SSD both directly via SATA and via a USB enclosure,
the same partitions/file systems across the uses. Only when it
was SATA-style-use did TRIM work.
Post by bob prohaska
According to Sandisk,
"All microSD or USB drives are flash memory and does support the TRIM command, however,
you will not notice any difference after running TRIM command on memory cards or USB
drives. TRIM command is basically used for SSD and Hard drives."
This gets back into what the protocols in use allow to be
requested when direct communication with the flash is not
in use. (More may be involved.)
Post by bob prohaska
The "you will not notice any difference...." qualification makes me slightly uncertain
the reply was well-informed, but if there's any hope of success I'd like to try it.
Post by Kirk McKusick
From time to time there seem to be traffic jams among flash devices on the RPI3, it
would a pleasant surprise if this feature helps.
I'll note that gstat with -d allows watching the "BIO_DELETE"
operations (in FreeBSD terms). One can see if they are what
time is being spent on.

Quoting g_bio(9) :

BIO_DELETE Indicates that a certain range of data is no
longer used and that it can be erased or
freed as the underlying technology supports.
Technologies like flash adaptation layers can
arrange to erase the relevant blocks before
they will become reassigned and cryptographic
devices may want to fill random bits into the
range to reduce the amount of data available
for attack.

In your rpi3/2 experiments if you watch the column
sequence:

d/s kBps ms/d

I expect that you will find that they stay at:

0 0 0.0

indicating lack of use.



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
bob prohaska
2018-08-23 03:38:27 UTC
Permalink
Post by Mark Millard
I've used a SSD both directly via SATA and via a USB enclosure,
the same partitions/file systems across the uses. Only when it
was SATA-style-use did TRIM work.
This is likely the key to my question. If USB blocks the TRIM service
the behavior of the device doesn't matter.

As an aside, Sandisk now says:
"Please be informed that we have not tested running TRIM commands on USB flash drive
and microSD cards therefore we would not be able to comment on it explicitly."

Thanks for reading,

bob prohaska
Jan Henrik Sylvester
2018-08-23 11:11:58 UTC
Permalink
Post by bob prohaska
Post by Mark Millard
I've used a SSD both directly via SATA and via a USB enclosure,
the same partitions/file systems across the uses. Only when it
was SATA-style-use did TRIM work.
This is likely the key to my question. If USB blocks the TRIM service
the behavior of the device doesn't matter.
This is kind of off-topic in this thread about UFS, but if you
investigate TRIM on USB enclosures:

Some of them advertise TRIM support, for example Startech SM21BMU31C3
(based on Asmedia ASM1351 USB 3.1 Gen 2 chipset), but that is not the
whole story. Using the UASP protocol, they pass on the ata trim command,
which is used by Windows for NTFS trim support, but they do not pass the
SCSI UNMAP command, which is used by Linux. Sorry, I have not yet tested
this on FreeBSD, but on Linux, security erase of the entire SSD works
with the enclosure I have just mentioned, whereas trimming of a
filesystem (fstrim) does not work.

I have had exactly one enclosure that offered trimming on filesystems on
Linux: I have bought it on Ebay directly from China and I think it is
based on JMicron JMS567 USB 3.0 chipset. I have not found an mSATA
enclosure from any vendor in Europe that has this chipset. Of course,
having the right chipset is not enough, either, the firmware also has to
support it.

Please, correct me if I am wrong, but I think FreeBSD does not implement
UASP, yet. Hence, I doubt there will be any kind of trim support for any
USB-SATA bridge on FreeBSD and even security erase will probably not be
passed on.

Cheers,
Jan Henrik

Loading...