Re: [Crash-utility] Inaccurate timestamps in log -T output

Wednesday, 22 September 2021

Hi Martin,

Thank you for the detailed report and interesting idea.

The "log -T" option imitates the "dmesg -T" command, and it also
looks inaccurate in nature and has a warning in its help text.

# systemctl restart psacct ; date
Wed Sep 22 17:30:49 JST 2021
# dmesg -T | tail -n 1
[Wed Sep 22 17:30:22 2021] Process accounting resumed
# dmesg --help | grep -- -T
 -T, --ctime                 show human-readable timestamp (may be inaccurate!)

Maybe at least we should add a similar warning to the help page.

Personally I would prefer users to use it with the acknowledgement
of that inaccuracy like dmesg -T, rather than doing a crash specific
adjustment..

Thanks,
Kazu

-----Original Message-----
...
 From: crash-utility-bounces(a)redhat.com
<crash-utility-bounces(a)redhat.com&gt; On Behalf Of Moore, Martin
 (Linux ERT)
 Sent: Tuesday, September 21, 2021 1:58 AM
 To: Discussion list for crash utility usage, maintenance and development
<crash-utility(a)redhat.com&gt;
 Subject: [Crash-utility] Inaccurate timestamps in log -T output

 Hello,

 As the current maintainer of the PyKdump extension, I have been investigating a reported
discrepancy between
 the timestamps generated by crash’s log -T and PyKdump’s tslog command, which provides
the same functionality
 of timestamping the kernel log entries.  What I have found is that *both* log -T and
tslog generate some
 inaccurate timestamps on vmcores with more than a small amount of uptime, but in
different ways.
 Specifically, log -T timestamps are accurate at boot time but become less accurate later
in the log, while
 tslog timestamps are just the opposite: they’re accurate around crash time but less
accurate earlier in
 the log.

 Here’s an example from a panic on a kernel 3.10.0-957.5.1 system with a moderate uptime:

         DATE: Tue Jul  6 09:50:58 +03 2021

       UPTIME: 32 days, 16:29:29

        PANIC: "BUG: unable to handle kernel "

 Crash correctly computes the boot time based on the date (crash time) and uptime shown
above.  From “help
 -k”:

      boot_date: Thu Jun  3 17:21:29 +03 2021

 log -T shows the correct timestamps of the boot messages:

 [Thu Jun  3 17:21:29 +03 2021] Initializing cgroup subsys cpuset

 [Thu Jun  3 17:21:29 +03 2021] Initializing cgroup subsys cpu

 [Thu Jun  3 17:21:29 +03 2021] Initializing cgroup subsys cpuacct

 However, the timestamps of the BUG and surrounding messages are off by 150 seconds or 2.5
minutes (09:53:28,
 but the crash occurred at 09:50:58):

 [Tue Jul  6 09:53:28 +03 2021] BUG: unable to handle kernel

 [Tue Jul  6 09:53:28 +03 2021] NULL pointer dereference at           (null)

 [Tue Jul  6 09:53:28 +03 2021] IP: [<ffffffff91cd6e66>] scsi_uninit_cmd+0x26/0x40

 For comparison, tslog has the correct timestamps around the crash:

 2021-07-06 09:50:58 [2824319.188837] BUG: unable to handle kernel

 2021-07-06 09:50:58 [2824319.193440] NULL pointer dereference at           (null)

 2021-07-06 09:50:58 [2824319.197853] IP: [<ffffffff91cd6e66>]
scsi_uninit_cmd+0x26/0x40

 But it’s off by the same 2.5-minute interval during the boot messages (17:18:59, but it
should be 17:21:29):

 2021-06-03 17:18:59 [    0.000000] Initializing cgroup subsys cpuset

 2021-06-03 17:18:59 [    0.000000] Initializing cgroup subsys cpu

 2021-06-03 17:18:59 [    0.000000] Initializing cgroup subsys cpuacct

 The reasons for these inaccuracies are:

 1.       log -T computes boot time as (crash time minus uptime), and then adds the
seconds part of the time
 stored in each log entry to generate the entry’s timestamp.

 2.       tslog computes a base time as (crash time minus seconds part of the final log
entry) and then adds
 the seconds part of the time stored in each log entry.

 3.       The key to the problem: printk() stores the log entry timestamp based on a call
to local_clock(),
 which provides the raw time in the local CPU, which is not adjusted by NTP.  As such,
inaccuracy accumulates
 as the clock drifts over time; in the example above, the discrepancy has reached 150
seconds by the time
 of the crash.

 I have seen this now in numerous vmcores from kernels 3.0 through 4.12, which is the
latest version I’ve
 checked.  The discrepancy exists in all of them with more than a few hours of uptime and
can be positive
 or negative.  In one vmcore it was over 13 hours after 239 days of uptime.

 From the PyKdump perspective, I could live with the current tslog behavior because it’s
accurate for log
 entries near the crash time, which are generally the most useful for crash analysis.  But
in an attempt
 to improve it, I’m testing an experimental version of tslog that smooths out the
discrepancy.  It computes
 the boot time the same way that crash does (crash time minus uptime) and a scale factor
for the time discrepancy
 equal to (uptime)/(seconds part of final log entry).  Then each log entry’s timestamp is
computed as (boot
 time) + (seconds part of log entry)*(scale factor).  This version results in correct
timestamps at both
 ends of the log:

 2021-06-03 17:21:29 [    0.000000] Initializing cgroup subsys cpuset

 2021-06-03 17:21:29 [    0.000000] Initializing cgroup subsys cpu

 2021-06-03 17:21:29 [    0.000000] Initializing cgroup subsys cpuacct

     :

 2021-07-06 09:50:58 [2824319.188837] BUG: unable to handle kernel

 2021-07-06 09:50:58 [2824319.193440] NULL pointer dereference at           (null)

 2021-07-06 09:50:58 [2824319.197853] IP: [<ffffffff91cd6e66>]
scsi_uninit_cmd+0x26/0x40

 This is something of a hack since there’s no guarantee that the clock will drift
uniformly during the system
 uptime, so timestamps in the middle of the log may still be somewhat inaccurate (but
probably more accurate
 than without this adjustment).

 Regards,

 Martin

 Explore the HPE digital support experience: https://support.hpe.com
<https://support.hpe.com>

 Martin Moore
 Linux Engineering Resolution Team

 HPE Pointnext Services
 Hewlett Packard Enterprise

 Martin.Moore(a)hpe.com <mailto:Martin.Moore@hpe.com>
 8AM-5PM EDT (UTC-4) Monday-Friday
 Manager: Becky McBride (becky.mcbride(a)hpe.com <mailto:becky.mcbride@hpe.com> )

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Inaccurate timestamps in log -T output