在 2021年04月22日 17:33, HAGIO KAZUHITO(萩尾 一仁) 写道:
-----Original Message-----
> -----Original Message-----
>> 在 2021年01月12日 16:24, HAGIO KAZUHITO(萩尾 一仁) 写道:
>>> Hi Bhupesh,
>>>
>>> -----Original Message-----
>>>> We have hard-coded the HZ value for some ARCHs to either 1000 or 100
>>>> (mainly for kernel versions > 2.6.0), which causes 'help -m'
to show
>>>> an incorrect hz value for various architectures.
>>>
>>> Good catch. but seems crash uses (cfq_slice_async * 25) for machdep->hz
>>> if it exists (please see task_init()), RHEL7 has it, but RHEL8 does not.
>>> What do you see on RHEL8 for x86_64 with your patch?
>>>
>>
>> The symbol 'cfq_slice_async' has been removed from upstream kernel:
>> f382fb0bcef4 ("block: remove legacy IO schedulers")
>>
>> And RHEL8 also removed it.
>>
>>> We should search for an alternate way like the current one first.
>>>
>>
>> Currently, there are several ways to get the value of HZ as below:
>>
>> [1] calculate hz via the symbol 'cfq_slice_async'
>> But this symbol has been removed from upstream kernel
>
> According to [0] below, the 'cfq_slice_async' cannot be used for the HZ
> calculation on 4.8 and later kernels. I've not found a perfect alternate,
> but how about using 'bfq_timeout' for 4.12 and later including RHEL8?
e.g. like this:
--- a/task.c
+++ b/task.c
@@ -417,7 +417,16 @@ task_init(void)
STRUCT_SIZE_INIT(cputime_t, "cputime_t");
- if (symbol_exists("cfq_slice_async")) {
+ if (symbol_exists("bfq_timeout")) {
+ uint bfq_timeout;
+ get_symbol_data("bfq_timeout", sizeof(int), &bfq_timeout);
+ if (bfq_timeout) {
+ machdep->hz = bfq_timeout * 8;
+ if (CRASHDEBUG(2))
+ fprintf(fp, "bfq_timeout exists: setting hz to %d\n",
+ machdep->hz);
+ }
+ } else if (symbol_exists("cfq_slice_async")) {
uint cfq_slice_async;
get_symbol_data("cfq_slice_async", sizeof(int),
Lianbo, could you try this on ppc64le if it looks good?
Sure. On my ppc64le machine, crash got 96hz after applying the above patch. The
reason
is that kernel calculates the value of bfq_timeout as below:
bfq_timeout = HZ / 8;
The actual value of HZ is 100, so bfq_timeout = 100 / 8 = 12, but in crash, we calculate
the value of HZ:
HZ = bfq_timeout * 8 = 12 * 8 = 96
It seems that this is not the result what we expected.
btw, I thought 'read_expire' was better than the
'bfq_timeout' because it
was introduced at 2.6.16 and has been unchanged, but most of kernels(vmlinux)
Sounds good. But unfortunately, the 'read_expire' is a static variable in kernel,
we
can not get it directly by the symbol search. Maybe we should try to find a static
variable(kernel) in another ways.
If it is possible, I would tend to use the 'write_expire' to calculate the value
of HZ
in crash as below, that can avoid the above issues and get a correct result.
HZ = write_expire / 5;
/*
* source: block/mq-deadline.c
*/
static const int write_expire = 5 * HZ
For example:
+ if (symbol_exists("write_expire")) { ----> Here, it failed, maybe we
can try to find the symbol in another way.
+ uint write_expire;
+ get_symbol_data("write_expire", sizeof(int),
&write_expire);
+ if (write_expire) {
+ machdep->hz = write_expire / 5;
+ if (CRASHDEBUG(2))
+ fprintf(fp, "write_expire exists: setting hz to
%d\n",
+ machdep->hz);
+ }
+ } else
that I have do not have a symbol for it. (some optimization?)
I can get the values of 'read_expire' and 'write_expire' in the
latest rhel8 or later.
crash> p read_expire
$1 = 50
crash> p write_expire
$2 = 500
Thanks.
Linabo
static const int read_expire = HZ / 2; /* max time before a read is
submitted. */
RELEASE: 4.18.0-80.el8.x86_64
crash> p read_expire
No symbol "read_expire" in current context.
p: gdb request failed: p read_expire
Thanks,
Kazu
>
> const int bfq_timeout = HZ / 8;
>
> RELEASE: 4.18.0-80.el8.x86_64
>
> crash> p bfq_timeout
> bfq_timeout = $1 = 125
>
> This value has not been changed since its introduction (aee69d78dec0).
> Recent kernels configured with CONFIG_IOSCHED_BFQ=y can be covered with this?
>
> [0]
https://listman.redhat.com/archives/crash-utility/2021-April/msg00026.html
>
> Thanks,
> Kazu
>
>
>>
>> [2] hardcode hz with the value 1000 (if kernel version > 2.6.0)
>>
>> [3] get the hz value from vmcore, but that relies on kernel config
>> such as CONFIG_IKCONFIG, etc.
>>
>> [4] Use sysconf(_SC_CLK_TCK) on some arches, not all arches.
>> See the micro definition of HZ in the defs.h
>>
>> There seems to be no perfect solution. Any ideas?
>>
>>
>> Thanks.
>> Lianbo
>>
>>> Thanks,
>>> Kazu
>>>
>>>>
>>>> I tested this on ppc64le and x86_64 and the hz value reported is 1000,
>>>> whereas the kernel CONFIG_HZ_100 is set to Y. See some logs below:
>>>>
>>>> crash> help -m
>>>> flags: 124000f5
>>>>
>>
>
(KSYMS_START|MACHDEP_BT_TEXT|VM_4_LEVEL|VMEMMAP|VMEMMAP_AWARE|PHYS_ENTRY_L4|SWAP_ENTRY_L4|RADIX_MMU|OP
>>>> AL_FW)
>>>> kvbase: c000000000000000
>>>> identity_map_base: c000000000000000
>>>> pagesize: 65536
>>>> pageshift: 16
>>>> pagemask: ffffffffffff0000
>>>> pageoffset: ffff
>>>> stacksize: 16384
>>>> hz: 1000
>>>> mhz: 2800
>>>>
>>>> [host@rhel7]$ grep CONFIG_HZ_100=
redhat/configs/kernel-3.10.0-ppc64le.config
>>>> CONFIG_HZ_100=y
>>>>
>>>> Fix the same by using the sysconf(_SC_CLK_TCK) value instead of the
>>>> hardcoded HZ values depending on kernel versions.
>>>>
>>>
>
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
>
https://listman.redhat.com/mailman/listinfo/crash-utility