Thanks for that extra info Dave - it's super helpful!
Cheers,
Nuno
> -----Original Message-----
> From: Dave Anderson <anderson(a)redhat.com>
> Sent: Thursday, June 6, 2019 7:22 AM
> To: Nuno Das Neves <Nuno.Das(a)microsoft.com>
> Cc: crash-utility(a)redhat.com
> Subject: Re: [Crash-utility] [PATCH] Fix unsigned signed comparison causing
> segfault for small VMCOREINFO notes
>
> ----- Original Message -----
> > > -----Original Message-----
> > > From: Dave Anderson <anderson(a)redhat.com>
> > > Sent: Wednesday, June 5, 2019 2:56 PM
> > > To: Nuno Das Neves <Nuno.Das(a)microsoft.com>
> > > Cc: Nuno Das Neves <Nuno.Das(a)microsoft.com>
> > > Subject: Fwd: [Crash-utility] [PATCH] Fix unsigned signed comparison
> > > causing segfault for small VMCOREINFO notes
> > >
> > >
> > >
> > > ----- Forwarded Message -----
> > > From: "Dave Anderson" <anderson(a)redhat.com>
> > > To: "Discussion list for crash utility usage, maintenance and
> > > development"
> > > <crash-utility(a)redhat.com>
> > > Sent: Wednesday, June 5, 2019 4:18:02 PM
> > > Subject: Re: [Crash-utility] [PATCH] Fix unsigned signed comparison
> > > causing
> > > segfault for small VMCOREINFO notes
> > >
> > >
> > >
> > > ----- Original Message -----
> > > > Hi,
> > > >
> > > > This is a fix for a signed/unsigned comparison bug in
> > > > vmcoreinfo_read_string.
> > > > The bug causes a segmentation fault if size_vmcoreinfo + 1 is
smaller
> > > > than
> > > > the length of the key string passed in.
> > >
> > > I suppose that's true, but can you describe the instance where that
> > > actually happened?
> >
> > I'm debugging some issues with vm2core
> > (github dot com slash Azure slash azure-linux-utils), a tool for
> > converting Hyper-V snapshots to elf core files that can be analyzed with
> > Crash.
> > There was a problem where versions of Crash newer than 7.1.5 crashed with
> > elf
> > files generated by vm2core due to this issue.
> >
> > > Can you show the actual note contents as shown by "help -D"?
> >
> > Elf64_Nhdr:
> > n_namesz: 11 ("VMCOREINFO")
> > n_descsz: 10
> > n_type: 0 (unused)
> > VMCOREINFO
> >
> > So it seems the note doesn't really contain valid data as defined in
> > Documentation/kdump/vmcoreinfo.txt.
> >
> > I've done some additional testing and it appears this note isn't
needed
> > by
> > Crash at all - I can simply patch vm2core so it doesn't add the note.
> > So, I suppose this fix isn't required to solve my particular issue.
>
> I don't think you should avoid adding the VMCOREINFO note contents from the
> kernel.
>
> While the vast majority of VMCOREINFO items are only used by
> makedumpfile(8),
> several items are used by the crash utility. For example, the x86_64
> requires
> things like the kernel's phys_offset value (at least if it's non-zero),
> either
> the relocated "_stext" symbol value or KERNELOFFSET value for KASLR
> kernels,
> possibly the KERNEL_IMAGE_SIZE, and to be able to recognize 5-level page
> tables.
>
> >
> > However, could the VMCOREINFO note legitimately contain a single field
> > e.g. "PAGE_SIZE=4096"?
> > If so, this is just 14 characters; 14 + 1 <
> > strlen("NUMBER(KERNEL_IMAGE_SIZE)") - this string is used as the
argument
> > to
> > vmcoreinfo_read_string on line 202 of x86_64.c.
>
> No, it would never contains a single field. Here's a typical dump on a
> 5.0.0 x86_64
> kernel:
>
> crash> help -D
> ... [ cut ] ...
>
> n_namesz: 11 ("VMCOREINFO")
> n_descsz: 1980
> n_type: 0 (unused)
> OSRELEASE=5.0.0+
> PAGESIZE=4096
> SYMBOL(init_uts_ns)=ffffffffa9615540
> SYMBOL(node_online_map)=ffffffffa976e068
> SYMBOL(swapper_pg_dir)=ffffffffa960e000
> SYMBOL(_stext)=ffffffffa8200000
> SYMBOL(vmap_area_list)=ffffffffa9666230
> SYMBOL(mem_section)=ffff90adbbff8000
> LENGTH(mem_section)=2048
> SIZE(mem_section)=16
> OFFSET(mem_section.section_mem_map)=0
> SIZE(page)=64
> SIZE(pglist_data)=14016
> SIZE(zone)=1280
> SIZE(free_area)=72
> SIZE(list_head)=16
> SIZE(nodemask_t)=8
> OFFSET(page.flags)=0
> OFFSET(page._refcount)=52
> OFFSET(page.mapping)=24
> OFFSET(page.lru)=8
> OFFSET(page._mapcount)=48
> OFFSET(page.private)=40
> OFFSET(page.compound_dtor)=16
> OFFSET(page.compound_order)=17
> OFFSET(page.compound_head)=8
> OFFSET(pglist_data.node_zones)=0
> OFFSET(pglist_data.nr_zones)=13344
> OFFSET(pglist_data.node_start_pfn)=13352
> OFFSET(pglist_data.node_spanned_pages)=13368
> OFFSET(pglist_data.node_id)=13376
> OFFSET(zone.free_area)=192
> OFFSET(zone.vm_stat)=1088
> OFFSET(zone.spanned_pages)=112
> OFFSET(free_area.free_list)=0
> OFFSET(list_head.next)=0
> OFFSET(list_head.prev)=8
> OFFSET(vmap_area.va_start)=0
> OFFSET(vmap_area.list)=48
> LENGTH(zone.free_area)=11
> SYMBOL(log_buf)=ffffffffa964b250
> SYMBOL(log_buf_len)=ffffffffa964b24c
> SYMBOL(log_first_idx)=ffffffffa9d71668
> SYMBOL(clear_idx)=ffffffffa9d71634
> SYMBOL(log_next_idx)=ffffffffa9d71658
> SIZE(printk_log)=16
> OFFSET(printk_log.ts_nsec)=0
> OFFSET(printk_log.len)=8
> OFFSET(printk_log.text_len)=10
> OFFSET(printk_log.dict_len)=12
> LENGTH(free_area.free_list)=4
> NUMBER(NR_FREE_PAGES)=0
> NUMBER(PG_lru)=4
> NUMBER(PG_private)=13
> NUMBER(PG_swapcache)=10
> NUMBER(PG_swapbacked)=19
> NUMBER(PG_slab)=9
> NUMBER(PG_head_mask)=65536
> NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
> NUMBER(HUGETLB_PAGE_DTOR)=2
> NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
> NUMBER(phys_base)=-497025024
> SYMBOL(init_top_pgt)=ffffffffa960e000
> NUMBER(pgtable_l5_enabled)=0
> SYMBOL(node_data)=ffffffffa976d680
> LENGTH(node_data)=64
> KERNELOFFSET=27200000
> NUMBER(KERNEL_IMAGE_SIZE)=1073741824
> NUMBER(sme_mask)=0
> CRASHTIME=1555324290
>
> I would guess the problem is how much awareness your vm2core facility has
> concerning the target kernel, e.g., can it can find the physical
> memory containing the VMCOREINFO data. On the target system,
> the physical address and a page-granularity size can be located here:
>
> $ cat /sys/kernel/vmcoreinfo
> 2a21197c0 1000
> $
>
> The kernel symbols of the data is "vmcoreinfo_data", and the actual
> size of the note is "vmcoreinfo_size". Or in 4.19 and later kernels,
> /proc/kcore now has the VMCOREINFO note appended.
>
> The KVM developers went through the same process as you are, and they
> eventually implemented a mechanism where they pass the vmcoreinfo data from
> the target kernel to the KVM host that is doing a "virsh dump".
>
> In any case, even though it's theoretically impossible to have a
> zero-length
> or very small VMCOREINFO note, your patch does makes logical sense, so I
> will
> apply it.
>
> Thanks,
> Dave
>
> > Regards,
> > Nuno
> >
> > >
> > > Thanks,
> > > Dave
> > >
> > >
> > > >
> > > > Signed-off-by: Nuno Das Neves <nudasnev(a)microsoft.com>
> > > > ---
> > > > netdump.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/netdump.c b/netdump.c
> > > > index 40f9cde..d257ecd 100644
> > > > --- a/netdump.c
> > > > +++ b/netdump.c
> > > > @@ -1838,7 +1838,7 @@ vmcoreinfo_read_string(const char *key)
> > > > return NULL;
> > > >
> > > > /* the '+ 1' is the equal sign */
> > > > - for (i = 0; i < (size_vmcoreinfo - key_length + 1); i++) {
> > > > + for (i = 0; i < (int)(size_vmcoreinfo - key_length + 1); i++) {
> > > > /*
> > > > * We must also check if we're at the beginning of VMCOREINFO
> > > > * or the separating newline is there, and of course if we
> > > > --
> > > > 1.8.3.1
> > > >
> > > > --
> > > > Crash-utility mailing list
> > > > Crash-utility(a)redhat.com
> > > >
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.red...
> > >
>
utility&data=02%7C01%7CNuno.Das%40microsoft.com%7C3a91428c54b34ed7855d08d6ea008afe%7C72f988bf86f141af91ab2d7cd0
> > >
11db47%7C1%7C0%7C636953685382744097&sdata=BVID2b0QYkDmBip0bzPjX%2Fl4vltJtf78JrTE7pnqSIM%3D&reserved=0
> > > >
> >