stable/13, vm page counts do not add up

Post by Andriy Gapon
I regularly see that the top's memory line does not add up (and by a lot).
That can be seen with vm.stats as well.
$ sysctl vm.stats | fgrep count
vm.stats.vm.v_cache_count: 0
vm.stats.vm.v_user_wire_count: 3231
vm.stats.vm.v_laundry_count: 262058
vm.stats.vm.v_inactive_count: 3054178
vm.stats.vm.v_active_count: 621131
vm.stats.vm.v_wire_count: 1871176
vm.stats.vm.v_free_count: 187777
vm.stats.vm.v_page_count: 8134982
$ bc

187777 + 1871176 + 621131 + 3054178 + 262058

5996320

8134982 - 5996320

2138662
As you can see, it's not a small number of pages either.
Approximately 2 million pages, 8 gigabytes or 25% of the whole memory on this
system.
This is 47c00a9835926e96, 13.0-STABLE amd64.
I do not think that I saw anything like that when I used (much) older FreeBSD.

One relevant change is that vm_page_wire() no longer removes pages from
LRU queues, so the count of pages in the queues can include wired pages.
If the page daemon runs, it will dequeue any wired pages that are
encountered.

This was done to reduce queue lock contention, operations like
sendfile() which transiently wire pages would otherwise trigger two
queue operations per page. Now that queue operations are batched this
might not be as important.

We could perhaps add a new flavour of vm_page_wire() which is not lazy
and would be suited for e.g., the buffer cache. What is the primary
source of wired pages in this case?

Andriy Gapon

2021-04-07 20:22:41 UTC

187777 + 1871176 + 621131 + 3054178 + 262058

5996320

8134982 - 5996320

2138662
As you can see, it's not a small number of pages either.
Approximately 2 million pages, 8 gigabytes or 25% of the whole memory on this
system.
This is 47c00a9835926e96, 13.0-STABLE amd64.
I do not think that I saw anything like that when I used (much) older FreeBSD.

Maybe I misunderstand how that works, but I would expect that the sum of all
counters could be greater than v_page_count at times. But in my case it's less.

Post by Mark Johnston
This was done to reduce queue lock contention, operations like
sendfile() which transiently wire pages would otherwise trigger two
queue operations per page. Now that queue operations are batched this
might not be as important.
We could perhaps add a new flavour of vm_page_wire() which is not lazy
and would be suited for e.g., the buffer cache. What is the primary
source of wired pages in this case?

It should be ZFS, I guess.

--
Andriy Gapon

Mark Johnston

2021-04-07 20:56:49 UTC

187777 + 1871176 + 621131 + 3054178 + 262058

5996320

8134982 - 5996320

2138662
As you can see, it's not a small number of pages either.
Approximately 2 million pages, 8 gigabytes or 25% of the whole memory on this
system.
This is 47c00a9835926e96, 13.0-STABLE amd64.
I do not think that I saw anything like that when I used (much) older FreeBSD.

Maybe I misunderstand how that works, but I would expect that the sum of all
counters could be greater than v_page_count at times. But in my case it's less.

I misread, sorry. You're right, what I described would cause double
counting.

I don't know what might be causing it then. It could be a page leak.
The kernel allocates wired pages without adjusting the v_wire_count
counter in some cases, but the ones I know about happen at boot and
should not account for such a large disparity. I do not see it on a few
systems that I have access to.

It should be ZFS, I guess.
--
Andriy Gapon

Weiß, Dr. Jürgen

2021-04-09 22:33:14 UTC

-----Original Message-----
Sent: Wednesday, April 7, 2021 10:57 PM
Subject: Re: stable/13, vm page counts do not add up

187777 + 1871176 + 621131 + 3054178 + 262058

5996320

8134982 - 5996320

2138662
As you can see, it's not a small number of pages either.
Approximately 2 million pages, 8 gigabytes or 25% of the whole memory

on this

Post by Andriy Gapon
system.
This is 47c00a9835926e96, 13.0-STABLE amd64.
I do not think that I saw anything like that when I used (much) older

FreeBSD.

Post by Mark Johnston
One relevant change is that vm_page_wire() no longer removes pages

from

Post by Mark Johnston
LRU queues, so the count of pages in the queues can include wired

pages.

Post by Mark Johnston
If the page daemon runs, it will dequeue any wired pages that are
encountered.

Maybe I misunderstand how that works, but I would expect that the sum

of all

Post by Andriy Gapon
counters could be greater than v_page_count at times. But in my case it's

less.
I misread, sorry. You're right, what I described would cause double
counting.
I don't know what might be causing it then. It could be a page leak.
The kernel allocates wired pages without adjusting the v_wire_count
counter in some cases, but the ones I know about happen at boot and
should not account for such a large disparity. I do not see it on a few
systems that I have access to.

It should be ZFS, I guess.
--
Andriy Gapon

_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-

I see kernel memory disappearing, when enabling ktls:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253281

Last test done with 13.0-RC1.

I'm a bit at a loss how to debug this further.

Regards

Juergen Weiss

Juergen Weiss |
***@uni-mainz.de |

Andriy Gapon

2021-04-13 14:01:49 UTC

Post by Mark Johnston
I don't know what might be causing it then. It could be a page leak.
The kernel allocates wired pages without adjusting the v_wire_count
counter in some cases, but the ones I know about happen at boot and
should not account for such a large disparity. I do not see it on a few
systems that I have access to.

Mark or anyone,

do you have a suggestion on how to approach hunting for the potential page leak?
It's been a long while since I worked with that code and it changed a lot.

Here is some additional info.
I had approximately 2 million unaccounted pages.
I rebooted the system and that number became 20 thousand which is more
reasonable and could be explained by those boot-time allocations that you mentioned.
After 30 hours of uptime the number became 60 thousand.

I monitored the number and so far I could not correlate it with any activity.

P.S.
I have not been running any virtual machines.
I do use nvidia graphics driver.

--
Andriy Gapon

Mark Johnston

2021-04-13 21:18:42 UTC

Mark or anyone,
do you have a suggestion on how to approach hunting for the potential page leak?
It's been a long while since I worked with that code and it changed a lot.
Here is some additional info.
I had approximately 2 million unaccounted pages.
I rebooted the system and that number became 20 thousand which is more
reasonable and could be explained by those boot-time allocations that you mentioned.
After 30 hours of uptime the number became 60 thousand.
I monitored the number and so far I could not correlate it with any activity.
P.S.
I have not been running any virtual machines.
I do use nvidia graphics driver.

My guess is that something is allocating pages without VM_ALLOC_WIRE and
either they're managed and something is failing to place them in page
queues, or they're unmanaged and should likely be counted as wired.

It is also possible that something is allocating wired, unmanaged
pages and unwiring them without freeing them. For managed pages,
vm_page_unwire() ensures they get placed in a queue.
vm_page_unwire_noq() does not, but it is typically only used with
unmanaged pages.

The nvidia drivers do not appear to call any vm_page_* functions, at
least based on the kld symbol tables.

So you might try using DTrace to collect stacks for these functions,
leaving it running for a while and comparing stack counts with the
number of pages leaked while the script is running. Something like:

fbt::vm_page_alloc_domain_after:entry
/(args[3] & 0x20) == 0/
{
@alloc[stack()] = count();
}

fbt::vm_page_alloc_contig_domain:entry
/(args[3] & 0x20) == 0/
{
@alloc[stack()] = count();
}

fbt::vm_page_unwire_noq:entry
{
@unwire[stack()] = count();
}

fbt::vm_page_unwire:entry
/args[0]->oflags & 0x4/
{
@unwire[stack()] = count();
}

It might be that the count of leaked pages does not relate directly to
the counts collected by the script, e.g., because there is some race
that results in a leak. But we can try to rule out some easier cases
first.

I tried to look for possible causes of the KTLS page leak mentioned
elsewhere in this thread but can't see any obvious problems. Does your
affected system use sendfile() at all? I also wonder if you see much
mbuf usage on the system.

Andriy Gapon

2021-04-14 11:21:44 UTC

Post by Mark Johnston
fbt::vm_page_unwire:entry
/args[0]->oflags & 0x4/
{
@unwire[stack()] = count();
}

Unrelated report, dtrace complains about this probe on my stable/13 system:
failed to resolve translated type for args[0]

And I do not have any idea why...

From ctfdump:
[27290] FUNC (vm_page_unwire) returns: 38 args: (1463, 3)

<1463> TYPEDEF vm_page_t refers to 778
<778> POINTER (anon) refers to 3575
<3575> STRUCT vm_page (104 bytes)
plinks type=3563 off=0
listq type=3558 off=128
object type=3564 off=256
pindex type=3565 off=320
phys_addr type=42 off=384
md type=3571 off=448
ref_count type=31 off=640
busy_lock type=31 off=672
a type=3573 off=704
order type=3 off=736
pool type=3 off=744
flags type=3 off=752
oflags type=3 off=760
psind type=2167 off=768
segind type=2167 off=776
valid type=3574 off=784
dirty type=3574 off=792

--
Andriy Gapon

Mark Johnston

2021-04-14 13:32:32 UTC

Post by Mark Johnston
fbt::vm_page_unwire:entry
/args[0]->oflags & 0x4/
{
@unwire[stack()] = count();
}

failed to resolve translated type for args[0]
And I do not have any idea why...

There was a regression, see PR 253440. I think you have the fix
already, but perhaps not. Could you show output from
"dtrace -lv -n fbt::vm_page_unwire:entry"?

Post by Andriy Gapon
[27290] FUNC (vm_page_unwire) returns: 38 args: (1463, 3)
<1463> TYPEDEF vm_page_t refers to 778
<778> POINTER (anon) refers to 3575
<3575> STRUCT vm_page (104 bytes)
plinks type=3563 off=0
listq type=3558 off=128
object type=3564 off=256
pindex type=3565 off=320
phys_addr type=42 off=384
md type=3571 off=448
ref_count type=31 off=640
busy_lock type=31 off=672
a type=3573 off=704
order type=3 off=736
pool type=3 off=744
flags type=3 off=752
oflags type=3 off=760
psind type=2167 off=768
segind type=2167 off=776
valid type=3574 off=784
dirty type=3574 off=792
--
Andriy Gapon

Andriy Gapon

2021-04-14 15:10:15 UTC

Post by Mark Johnston
fbt::vm_page_unwire:entry
/args[0]->oflags & 0x4/
{
@unwire[stack()] = count();
}

failed to resolve translated type for args[0]
And I do not have any idea why...

There was a regression, see PR 253440. I think you have the fix
already, but perhaps not. Could you show output from
"dtrace -lv -n fbt::vm_page_unwire:entry"?

dtrace -lv -n fbt::vm_page_unwire:entry
ID PROVIDER MODULE FUNCTION NAME
54323 fbt kernel vm_page_unwire entry

Probe Description Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: Unknown

Argument Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: ISA

Argument Types
args[0]: (unknown)
args[1]: (unknown)

It seems that I should have the fix, but somehow I still have the problem.
I've been doing NO_CLEAN builds for a long while, so maybe some stale file
didn't get re-created...

It looks that dt_lex.c under /usr/obj is rather dated.

... I've removed that file and rebuilt libdtrace and everything is okay now.
Thank you.

--
Andriy Gapon

Rozhuk Ivan

2021-04-20 03:15:36 UTC