在 2021年04月23日 14:41, lijiang 写道:
在 2021年04月22日 22:26, lijiang 写道:
> 在 2021年04月22日 17:33, HAGIO KAZUHITO(萩尾 一仁) 写道:
>> -----Original Message-----
>>> -----Original Message-----
>>>> 在 2021年01月12日 16:24, HAGIO KAZUHITO(萩尾 一仁) 写道:
>>>>> Hi Bhupesh,
>>>>>
>>>>> -----Original Message-----
>>>>>> We have hard-coded the HZ value for some ARCHs to either 1000 or
100
>>>>>> (mainly for kernel versions > 2.6.0), which causes 'help
-m' to show
>>>>>> an incorrect hz value for various architectures.
>>>>>
>>>>> Good catch. but seems crash uses (cfq_slice_async * 25) for
machdep->hz
>>>>> if it exists (please see task_init()), RHEL7 has it, but RHEL8 does
not.
>>>>> What do you see on RHEL8 for x86_64 with your patch?
>>>>>
>>>>
>>>> The symbol 'cfq_slice_async' has been removed from upstream
kernel:
>>>> f382fb0bcef4 ("block: remove legacy IO schedulers")
>>>>
>>>> And RHEL8 also removed it.
>>>>
>>>>> We should search for an alternate way like the current one first.
>>>>>
>>>>
>>>> Currently, there are several ways to get the value of HZ as below:
>>>>
>>>> [1] calculate hz via the symbol 'cfq_slice_async'
>>>> But this symbol has been removed from upstream kernel
>>>
>>> According to [0] below, the 'cfq_slice_async' cannot be used for the
HZ
>>> calculation on 4.8 and later kernels. I've not found a perfect
alternate,
>>> but how about using 'bfq_timeout' for 4.12 and later including
RHEL8?
>>
>> e.g. like this:
>>
>> --- a/task.c
>> +++ b/task.c
>> @@ -417,7 +417,16 @@ task_init(void)
>>
>> STRUCT_SIZE_INIT(cputime_t, "cputime_t");
>>
>> - if (symbol_exists("cfq_slice_async")) {
>> + if (symbol_exists("bfq_timeout")) {
>> + uint bfq_timeout;
>> + get_symbol_data("bfq_timeout", sizeof(int), &bfq_timeout);
>> + if (bfq_timeout) {
>> + machdep->hz = bfq_timeout * 8;
>> + if (CRASHDEBUG(2))
>> + fprintf(fp, "bfq_timeout exists: setting hz to %d\n",
>> + machdep->hz);
>> + }
>> + } else if (symbol_exists("cfq_slice_async")) {
>> uint cfq_slice_async;
>>
>> get_symbol_data("cfq_slice_async", sizeof(int),
>>
>>
>> Lianbo, could you try this on ppc64le if it looks good?
>>
> Sure. On my ppc64le machine, crash got 96hz after applying the above patch. The
reason
> is that kernel calculates the value of bfq_timeout as below:
>
> bfq_timeout = HZ / 8;
>
> The actual value of HZ is 100, so bfq_timeout = 100 / 8 = 12, but in crash, we
calculate
> the value of HZ:
>
> HZ = bfq_timeout * 8 = 12 * 8 = 96
>
> It seems that this is not the result what we expected.
>
>> btw, I thought 'read_expire' was better than the 'bfq_timeout'
because it
>> was introduced at 2.6.16 and has been unchanged, but most of kernels(vmlinux)
>
> Sounds good. But unfortunately, the 'read_expire' is a static variable in
kernel, we
> can not get it directly by the symbol search. Maybe we should try to find a static
> variable(kernel) in another ways.
>
> If it is possible, I would tend to use the 'write_expire' to calculate the
value of HZ
> in crash as below, that can avoid the above issues and get a correct result.
>
> HZ = write_expire / 5;
>
> /*
> * source: block/mq-deadline.c
> */
> static const int write_expire = 5 * HZ
>
> For example:
> + if (symbol_exists("write_expire")) { ----> Here, it failed,
maybe we can try to find the symbol in another way.
> + uint write_expire;
> + get_symbol_data("write_expire", sizeof(int),
&write_expire);
> + if (write_expire) {
> + machdep->hz = write_expire / 5;
> + if (CRASHDEBUG(2))
> + fprintf(fp, "write_expire exists: setting hz to
%d\n",
> + machdep->hz);
> + }
> + } else
>
>> that I have do not have a symbol for it. (some optimization?)
>>
> I can get the values of 'read_expire' and 'write_expire' in the
latest rhel8 or later.
>
> crash> p read_expire
> $1 = 50
> crash> p write_expire
> $2 = 500
>
> Thanks.
> Linabo
>
How do you think about the following changes? It works for me.
/*
* source: net/ipv4/inetpeer.c
* int inet_peer_minttl __read_mostly = 120 * HZ; /* TTL under high load: 120 sec */
*/
diff --git a/task.c b/task.c
index 423cd45..4af3ef3 100644
--- a/task.c
+++ b/task.c
@@ -417,7 +417,17 @@ task_init(void)
STRUCT_SIZE_INIT(cputime_t, "cputime_t");
- if (symbol_exists("cfq_slice_async")) {
+ if (symbol_exists("inet_peer_minttl")) {
+ uint inet_peer_minttl;
+ get_symbol_data("inet_peer_minttl", sizeof(int),
&inet_peer_minttl);
+ if (inet_peer_minttl) {
+ machdep->hz = inet_peer_minttl / 120;
+ if (CRASHDEBUG(2))
+ fprintf(fp, "inet_peer_minttl exists: setting hz to
%d\n",
+ machdep->hz);
+ }
+ } else if (symbol_exists("cfq_slice_async")) {
uint cfq_slice_async;
And, I would tend to replace the 'cfq_slice_async' with the
'inet_peer_minttl' as below,
the reason is that it has hardly changed so far(v2.6.12-rc2), and the variable is in the
net/ipv4/inetpeer.c module, which is supported by most kernel configuration. What's
your
opinion?
diff --git a/task.c b/task.c
index 423cd454502b..5994fe2b7351 100644
--- a/task.c
+++ b/task.c
@@ -417,18 +417,18 @@ task_init(void)
STRUCT_SIZE_INIT(cputime_t, "cputime_t");
- if (symbol_exists("cfq_slice_async")) {
- uint cfq_slice_async;
+ if (symbol_exists("inet_peer_minttl")) {
+ int inet_peer_minttl;
- get_symbol_data("cfq_slice_async", sizeof(int),
- &cfq_slice_async);
+ get_symbol_data("inet_peer_minttl", sizeof(int),
+ &inet_peer_minttl);
- if (cfq_slice_async) {
- machdep->hz = cfq_slice_async * 25;
+ if (inet_peer_minttl) {
+ machdep->hz = inet_peer_minttl / 120;
if (CRASHDEBUG(2))
fprintf(fp,
- "cfq_slice_async exists: setting hz to
%d\n",
+ "inet_peer_minttl exists: setting hz to
%d\n",
machdep->hz);
}
}
--
Thanks.
Thanks.
Lianbo
>> static const int read_expire = HZ / 2; /* max time before a read is submitted.
*/
>>
>> RELEASE: 4.18.0-80.el8.x86_64
>>
>> crash> p read_expire
>> No symbol "read_expire" in current context.
>> p: gdb request failed: p read_expire
>>
>> Thanks,
>> Kazu
>>
>>>
>>> const int bfq_timeout = HZ / 8;
>>>
>>> RELEASE: 4.18.0-80.el8.x86_64
>>>
>>> crash> p bfq_timeout
>>> bfq_timeout = $1 = 125
>>>
>>> This value has not been changed since its introduction (aee69d78dec0).
>>> Recent kernels configured with CONFIG_IOSCHED_BFQ=y can be covered with
this?
>>>
>>> [0]
https://listman.redhat.com/archives/crash-utility/2021-April/msg00026.html
>>>
>>> Thanks,
>>> Kazu
>>>
>>>
>>>>
>>>> [2] hardcode hz with the value 1000 (if kernel version > 2.6.0)
>>>>
>>>> [3] get the hz value from vmcore, but that relies on kernel config
>>>> such as CONFIG_IKCONFIG, etc.
>>>>
>>>> [4] Use sysconf(_SC_CLK_TCK) on some arches, not all arches.
>>>> See the micro definition of HZ in the defs.h
>>>>
>>>> There seems to be no perfect solution. Any ideas?
>>>>
>>>>
>>>> Thanks.
>>>> Lianbo
>>>>
>>>>> Thanks,
>>>>> Kazu
>>>>>
>>>>>>
>>>>>> I tested this on ppc64le and x86_64 and the hz value reported is
1000,
>>>>>> whereas the kernel CONFIG_HZ_100 is set to Y. See some logs
below:
>>>>>>
>>>>>> crash> help -m
>>>>>> flags: 124000f5
>>>>>>
>>>>
>>>
(KSYMS_START|MACHDEP_BT_TEXT|VM_4_LEVEL|VMEMMAP|VMEMMAP_AWARE|PHYS_ENTRY_L4|SWAP_ENTRY_L4|RADIX_MMU|OP
>>>>>> AL_FW)
>>>>>> kvbase: c000000000000000
>>>>>> identity_map_base: c000000000000000
>>>>>> pagesize: 65536
>>>>>> pageshift: 16
>>>>>> pagemask: ffffffffffff0000
>>>>>> pageoffset: ffff
>>>>>> stacksize: 16384
>>>>>> hz: 1000
>>>>>> mhz: 2800
>>>>>>
>>>>>> [host@rhel7]$ grep CONFIG_HZ_100=
redhat/configs/kernel-3.10.0-ppc64le.config
>>>>>> CONFIG_HZ_100=y
>>>>>>
>>>>>> Fix the same by using the sysconf(_SC_CLK_TCK) value instead of
the
>>>>>> hardcoded HZ values depending on kernel versions.
>>>>>>
>>>>>
>>>
>>>
>>> --
>>> Crash-utility mailing list
>>> Crash-utility(a)redhat.com
>>>
https://listman.redhat.com/mailman/listinfo/crash-utility
>>