On 2022/06/09 12:53, lijiang wrote:
On Thu, Jun 9, 2022 at 9:35 AM HAGIO KAZUHITO(萩尾 一仁)
<k-hagio-ab(a)nec.com <mailto:k-hagio-ab@nec.com>> wrote:
Hi Lianbo,
On 2022/06/08 22:27, lijiang wrote:
> Hi, Kazu
> Thank you for the fix.
>
> On Tue, Jun 7, 2022 at 4:42 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com
<mailto:k-hagio-ab@nec.com> <mailto:k-hagio-ab@nec.com
<mailto:k-hagio-ab@nec.com>>> wrote:
>
> There have been some reports that the "dev -d|-D" options
displayed
> incorrect I/O stats due to racy blk_mq_ctx.rq_completed value.
> To fix it, make the options use sbitmap to count I/O stats on RHEL8
> and adjust to its blk_mq_tags structure.
>
> Signed-off-by: Kazuhito Hagio <k-hagio-ab(a)nec.com
<mailto:k-hagio-ab@nec.com> <mailto:k-hagio-ab@nec.com
<mailto:k-hagio-ab@nec.com>>>
> ---
> The patch tested ok with upstream 4.0 to 5.18 kernels and several
> RHEL7 and RHEL8 ones.
>
> dev.c | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/dev.c b/dev.c
> index 0172c83ffaea..733e3a8a40cd 100644
> --- a/dev.c
> +++ b/dev.c
> @@ -4339,6 +4339,10 @@ static void bt_for_each(ulong q, ulong tags, ulong
sbq, uint reserved, uint nr_r
> static void queue_for_each_hw_ctx(ulong q, ulong *hctx, uint cnt, struct
diskio *dio)
> {
> uint i;
> + int bitmap_tags_is_ptr = 0;
> +
> + if (MEMBER_TYPE("blk_mq_tags", "bitmap_tags")
== TYPE_CODE_PTR)
> + bitmap_tags_is_ptr = 1;
>
>
> The above change is related to the kernel(v5.10-rc1) commit:
> 222a5ae03cdd ("blk-mq: Use pointers for blk_mq_tags bitmap tags")
>
> Could you please add it to the patch log?
Sure, I'll add them. Thank you for the information.
I did not search for the upstream patches.
Thanks.
>
> And later, kernel(v5.16-rc1) stops using pointers again:
> ae0f1a732f4a ("blk-mq: Stop using pointers for blk_mq_tags bitmap
tags")
>
> for (i = 0; i < cnt; i++) {
> ulong addr = 0, tags = 0;
> @@ -4357,9 +4361,17 @@ static void queue_for_each_hw_ctx(ulong q, ulong
*hctx, uint cnt, struct diskio
>
> if (nr_reserved_tags) {
> addr = tags +
OFFSET(blk_mq_tags_breserved_tags);
> + if (bitmap_tags_is_ptr &&
> + !readmem(addr, KVADDR, &addr,
sizeof(ulong),
> +
"blk_mq_tags.bitmap_tags", RETURN_ON_ERROR))
> + break;
> bt_for_each(q, tags, addr, 1, nr_reserved_tags,
dio);
> }
> addr = tags + OFFSET(blk_mq_tags_bitmap_tags);
> + if (bitmap_tags_is_ptr &&
> + !readmem(addr, KVADDR, &addr, sizeof(ulong),
> + "blk_mq_tags.bitmap_tags",
RETURN_ON_ERROR))
> + break;
> bt_for_each(q, tags, addr, 0, nr_reserved_tags, dio);
> }
> }
> @@ -4423,8 +4435,13 @@ get_mq_diskio(unsigned long q, unsigned long
*mq_count)
> unsigned long mctx_addr;
> struct diskio tmp = {0};
>
> - if (INVALID_MEMBER(blk_mq_ctx_rq_dispatched) ||
> - INVALID_MEMBER(blk_mq_ctx_rq_completed)) {
> + /*
> + * Currently it does not support earlier sbitmap and blk-mq
> + * implementation e.g. RHEL7, so filter them out.
> + */
> + if (VALID_STRUCT(sbitmap) &&
> + VALID_MEMBER(sbitmap_word_cleared) &&
> + VALID_MEMBER(request_state)) {
>
>
> The struct sbitmap was introduced in the kernel v4.9-rc1:
> 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap
library")
>
> The member of cleared in the struct sbitmap_word was added in the kernel
v5.0-rc1:
> ea86ea2cdced ("sbitmap: ammortize cost of clearing bits")
>
> The member of state in struct request was added in the kernel v4.18-rc1:
> 12f5b9314545 ("blk-mq: Remove generation seqeunce")
>
> It indicates that the current "dev -d" command won't work on the
vmcores generated with the kernel v5.0-rc1 and
> earlier(still use the rq_dispatched and rq_completed to count the IOs on the
old vmcores). These three symbols
> exist in the different kernel versions and put them together, it seems to be
confused. And it would increase the
> maintenance difficulty in the future.>
> To be on the safe side, I would suggest doing that in the distribution, because
the developers often backport some
> upstream patches into the distribution, it could be more reasonable. Any
ideas?
sorry, but I cannot see what is confusing and causes maintenance difficulty.
Do you mean that we should use only one macro checking if the latest
supported version v5.0, i.e. VALID_MEMBER(sbitmap_word_cleared) here?
But as you say, distributions can backport only a part of upstream patches,
so having several checks is safer, I think.
Yes, VALID_STRUCT(sbitmap) is unnecessary and can be removed. I added it
for readability, it says that we cannot use the function without sbitmap.
Is it better to remove this?
It could be good to keep only VALID_MEMBER(request_state) as you mentioned, once
crash tool picks up the patch in the attachment.
OK, I will send that attached patch first after testing.
Thanks,
Kazu
>
>
> btw, I'm preparing a patch to support sbitmap without ea86ea2cdced,
> to extend the supported version of 'sbitmapq' command. (attached)
> We can fix it first if you think it's better. And then, probably
> we will keep only VALID_MEMBER(request_state).
>
> This looks better to me. Thank you for the explanation, Kazu.
>
> Thanks,
> Kazu
>