Re: [Crash-utility] help debug number of CPU detect failure

Thursday, 5 March 2020

On Thu, Mar 5, 2020 at 12:19 PM Santosh <ysan99(a)gmail.com&gt; wrote:
...

 On Wed, Mar 4, 2020 at 2:49 PM Dave Anderson <anderson(a)prospeed.net&gt; wrote:
 >
 > > Hello List,
 > >
 > > I've a two ELF coredumps from two different HyperV VMs generated by this
 > > tool (https://github.com/Azure/azure-linux-utils/tree/master/vm2core).
 > >
 > > Crash works with one of these coredumps but do not work with other.
 > >
 > > I've placed the output generated by crash tool here:
 > >
 > > Not ok with crash:
 > > ./crash/crash /usr/lib/debug/boot/vmlinux-4.15.0-88-generic
 > > vm1_numa_4gb_5cpu.coredump --kaslr 600000 -m phys_base=4355784704 -d8
 > >  https://raw.githubusercontent.com/santoshx/temp/master/notok_with_crash.txt
 > >
 > > Ok with crash:
 > >  ./crash/crash /usr/lib/debug/boot/vmlinux-4.15.0-88-generic
 > > vm1_nonuma_4gb_5cpu.coredump --kaslr 3c00000 -m phys_base=2344615936 -d8
 > >  https://raw.githubusercontent.com/santoshx/temp/master/ok_with_crash.txt
 > >
 > >
 > > The problem I see that in non-working case crash fails to detect correct
 > > cpu_possible_mask:
 > >
 > > Relevant part of $ diff ok_with_crash.txt notok_with_crash.txt:
 > >
 > > <   cpu_active_mask: cpus: 0 1 2 3 4
 > > < FREEBUF(0)
 > > < <readmem: ffffffff86039f40, KVADDR, "pv_init_ops", 8, (ROE),
 > > 7ffe01722870>
 > > < <read_kdump: addr: ffffffff86039f40 paddr: 91c39f40 cnt: 8>
 > > < read_netdump: addr: ffffffff86039f40 paddr: 91c39f40 cnt: 8 offset:
 > > 91c3a760
 > > ---
 > >> <readmem: ffffffff826f2b60, KVADDR, "possible", 1024, (ROE),
 > >> 5638a35a2280>
 > >> <read_kdump: addr: ffffffff826f2b60 paddr: 1060f2b60 cnt: 1024>
 > >> read_netdump: addr: ffffffff826f2b60 paddr: 1060f2b60 cnt: 1024 offset:
 > >> fe0f3380
 > >> cpu_possible_mask: cpus: 3 4 5 6 8 13 14 18 20 21 22 26 28 29 30 33 36
 > >> 37 38 48 49 52 53 54 56 59 60 61 62 64 65 68 69 70 72 73 74 75 76 78 82
 > >> 83 85 86 90 91 93 94 96 99 101 102 104 105 108 109 110 114 116 117 118
 > >> 123 124 125 126 128 133 134 138 140 141 142 146 148 149 150 153 156 157
 > >> 158 168 169 172 173 174 176 179 180 181 182 184 185 188 189 190 192 193
 > >> 194 195 196 198 200 202 205 206 211 212 213 214 216 219 221 222 226 228
 > >> 229 230 232 233 234 235 236 238 242 243 245 246 248 251 253 254 256 257
 > >> 260 261 262 266 268 269 270 275 276 277 278 280 285 286 290 292 293 294
 > >> 298 300 301 302 305 308 309 310 320 321 324 325 326 328 331 332 333 334
 > >> 336 337 340 341 342 344 345 346 347 348 350 352 354 357 358 361 362 363
 > >> 365 366 370 372 373 374 376 378 381 382 385 388 389 390 392 393 394 395
 > >> 396 398 402 403 405 406 408 411 413 414 416 417 420 421 422 426 428 429
 > >> 430 435 436 437 438 440 445 446 450 452 453 454 458 460 461 462 465 468
 > >> 469 470 480 481 484 485 486 488 491 492 493 494 496 497 500 50
 > >  1 502 504 505 506 507 508 510 514 515 517 518 520 523 525 526 528 529 532
 > > 533 534 538 540 541 542 547 548 549
 > >
 > > I'm trying to find where the problem is? in the crash too or the tool that
 > > generated the ELF coredumps?
 >
 > I suspect that it's a problem with either the --kaslr offset and/or
 > the phys_base value that you have used.

 Is there method to know or print kaslr & phy_base in a running Linux system? 
Got it.

crash> p vmcoreinfo_data+1600
$12 = (unsigned char *) 0xffff90ff7cdc3640
"poison)=22\nNUMBER(PG_head_mask)=32768\nNUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-128\nNUMBER(HUGETLB_PAGE_DTOR)=2\nNUMBER(phys_base)=-499122176\nSYMBOL(init_top_pgt)=ffffffffa200a000\nSYMBOL(node_data)=ffffffffa225d780\nLENGTH(node_data)=1024\nKERNELOFFSET=1fc00000\nNUMB"...

...

 >
 > It appears that the read of the cpu_possible mask is not using the
 > correct virtual address, or perhaps the wrong physical address, and
 > as a result it is trying to translate bogus data.  In fact, the full
 > output txt file shows that every thing that it reads is garbage, e.g.,
 > the cpu masks, the utsname data structure, the linux_banner string, etc.
 >
 > Dave
 >
 >
 > --
 > Crash-utility mailing list
 > Crash-utility(a)redhat.com
 > https://www.redhat.com/mailman/listinfo/crash-utility
 > 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] help debug number of CPU detect failure