On Wed, May 25, 2016 at 10:28:10AM -0400, Dave Anderson wrote:
----- Original Message -----
> On Tue, May 24, 2016 at 01:59:06PM -0400, Dave Anderson wrote:
> >
> > ----- Original Message -----
> > > Yet some issues, but ...
> > >
> >
> > Hi Takahiro,
> >
> > Here are my general comments on my testing of the v2 patch, followed
> > by a few comments in the patch itself.
>
> Thank you for your quick review/testing, again.
>
> > First, the combination of the new memory map layout and KASLR is somewhat
> > confusing. I am testing your patch on a 4.6.0-0.rc7.git2.1.fc25 kernel
> > that has this configuration:
>
> Right. I use the "kaslr kernel" or "kaslr-enabled kernel" in
two
> meanings (almost interchangeably):
> - kernel with a new memory map, that is v4.6 or later
> - kernel that has ability of KASLR (configured with CONFIG_RANDOMIZE_RAM)
>
> When we talked offline, you mentioned the possibility of backporting
> KASLR to older kernel, so I hesitated to use "v4.6 or later".
> But I think that we'd better use "v4.6 or later" for a clarification
> for now.
Actually I was thinking more along the lines of the new vmemmap stuff
being backported. But it looks like that would be impossible without
also bringing along "kimage_voffset".
Yeah, a sbustantial amount of code on mm code must be changed, and it is
impractical to backport them.
> > config-arm64:# CONFIG_RANDOMIZE_BASE is not set
> >
> > So KASLR doesn't really enter into the picture. But when bringing
> > up the crash session, it shows the "kaslr kernel" WARNING:
> >
> > # ./crash
> >
> > crash 7.1.5++
> > Copyright (C) 2002-2016 Red Hat, Inc.
> > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> > Copyright (C) 1999-2006 Hewlett-Packard Co
> > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> > Copyright (C) 2005, 2011 NEC Corporation
> > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> > This program is free software, covered by the GNU General Public License,
> > and you are welcome to change it and/or distribute copies of it under
> > certain conditions. Enter "help copying" to see the conditions.
> > This program has absolutely no warranty. Enter "help warranty"
for
> > details.
> >
> > WARNING: kimage_voffset not identified for kaslr kernel
> > GNU gdb (GDB) 7.6
> > Copyright (C) 2013 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > <
http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law. Type "show
> > copying"
> > and "show warranty" for details.
> > This GDB was configured as "aarch64-unknown-linux-gnu"...
> >
> > KERNEL:
> > /usr/lib/debug/lib/modules/4.6.0-0.rc7.git2.1.fc25.aarch64/vmlinux
> > DUMPFILE: /dev/crash
> > CPUS: 8
> > DATE: Tue May 24 10:08:08 2016
> > UPTIME: 11 days, 18:32:41
> > LOAD AVERAGE: 0.17, 0.09, 0.12
> > TASKS: 197
> > NODENAME:
apm-mustang-ev3-36.khw.lab.eng.bos.redhat.com
> > RELEASE: 4.6.0-0.rc7.git2.1.fc25.aarch64
> > VERSION: #1 SMP Thu May 12 13:28:43 UTC 2016
> > MACHINE: aarch64 (unknown Mhz)
> > MEMORY: 16 GB
> > PID: 7556
> > COMMAND: "crash"
> > TASK: fffffe00beb45400 [THREAD_INFO: fffffe00beb98000]
> > CPU: 7
> > STATE: TASK_RUNNING (ACTIVE)
> >
> > crash>
> >
> > Why show that WARNING in this case? I understant that it's not
> > stashed during early initialization:
>
> If we don't know kimage_voffset, we can't analyze the contents of
> memory (live or kdump).
I understand. But clearly that's not the case with my system above that
does not have CONFIG_RANDOMIZE_BASE configured, so the warning message
is nonsensical/confusing. That's why I was speculating about perhaps
having kdump export a VMCOREINFO_CONFIG(RANDOMIZE_BASE) so that it is
readily apparent.
>
> > crash> help -m
> > flags: 104000c5
> >
(KSYMS_START|VM_L2_64K|VMEMMAP|IRQ_STACKS|MACHDEP_BT_TEXT|NEW_VMEMMAP)
> > ... [ cut ] ...
> > memory map layout: new <--- BTW, this is
> > redundant/not-a-member, and we've got NEW_VMEMMAP
> > VA_BITS: 42
> > userspace_top: 0000040000000000
> > page_offset: fffffe0000000000
> > vmalloc_start_addr: fffffc0008000000
> > vmalloc_end: fffffdff5ffeffff
> > modules_vaddr: fffffc0000000000
> > modules_end: fffffc0007ffffff
> > vmemmap_vaddr: fffffdff80000000
> > vmemmap_end: fffffdffffffffff
> > kimage_text: fffffc0008080000
> > kimage_end: fffffc0009070000
> > kimage_voffset: 0000000000000000 <-- available if kernel is
> > not randomized
> > phys_offset: 4000000000
> > ...
> >
> > But it can be read:
> >
> > crash> px kimage_voffset
> > kimage_voffset = $1 = 0xfffffbc008000000
> > crash>
> >
> > SO because it wasn't determined during session initialization,
> > it falls into the "no randomness" section of arm64_VTOP():
>
> Exactly.
>
> > ulong
> > arm64_VTOP(ulong addr)
> > {
> > if (!(machdep->flags & NEW_VMEMMAP) ||
> > (addr >= machdep->machspec->page_offset)) {
> > return machdep->machspec->phys_offset
> > + (addr - machdep->machspec->page_offset);
> > } else {
> > if (machdep->machspec->kimage_voffset)
> > return addr -
machdep->machspec->kimage_voffset;
> > else /* no randomness */
> > return machdep->machspec->phys_offset
> > + (addr -
> >
machdep->machspec->vmalloc_start_addr);
> > }
> > }
> >
> > That works, but if "kimage_offset" can be read normally later on,
perhaps it
> > should update "machdep->machspec->kimage_voffset" at that
time?
>
> It is possible, but the check in arm64_VTOP() would be still necessary.
Agreed...
Keep it there.
>
> > There are some discrepancies with respect to the calculated addresses
> > and what is seen by looking at the reported memory map in the kernel log:
> >
> > [ 0.000000] Virtual kernel memory layout:
> > [ 0.000000] modules : 0xfffffc0000000000 - 0xfffffc0008000000 (
> > 128 MB)
> > [ 0.000000] vmalloc : 0xfffffc0008000000 - 0xfffffdfedfff0000 (
> > 2043 GB)
> > [ 0.000000] .text : 0xfffffc0008080000 - 0xfffffc0008890000 (
> > 8256 KB)
> > .rodata : 0xfffffc0008890000 - 0xfffffc0008c10000 (
> > 3584 KB)
> > .init : 0xfffffc0008c10000 - 0xfffffc0008d50000 (
> > 1280 KB)
> > .data : 0xfffffc0008d50000 - 0xfffffc0008eaac00 (
> > 1387 KB)
> > [ 0.000000] vmemmap : 0xfffffdfee0000000 - 0xfffffdffe0000000 (
> > 4 GB maximum)
> > 0xfffffdfee0000000 - 0xfffffdfee1000000 (
> > 16 MB actual)
> > [ 0.000000] fixed : 0xfffffdfffe7d0000 - 0xfffffdfffec00000 (
> > 4288 KB)
> > [ 0.000000] PCI I/O : 0xfffffdfffee00000 - 0xfffffdffffe00000 (
> > 16 MB)
> > [ 0.000000] memory : 0xfffffe0000000000 - 0xfffffe0400000000 (
> > 16384 MB)
> >
> > Comparing it to the calculated values:
> >
> > VA_BITS: 42
> > userspace_top: 0000040000000000
> > page_offset: fffffe0000000000 OK
> > vmalloc_start_addr: fffffc0008000000 OK
> > vmalloc_end: fffffdff5ffeffff ? (seems wrong)
> > modules_vaddr: fffffc0000000000 OK
> > modules_end: fffffc0007ffffff OK (-1)
> > vmemmap_vaddr: fffffdff80000000 ? (seems wrong)
> > vmemmap_end: fffffdffffffffff ? (seems wrong)
> > kimage_text: fffffc0008080000 OK
> > kimage_end: fffffc0009070000 OK
> > kimage_voffset: 0000000000000000
> > phys_offset: 4000000000
> >
> > Starting with vmalloc_start_addr/vmalloc_end, if I run "kmem -v" to
dump
> > the
> > vmalloc allocations, there are some allocations that are below the
> > 0xfffffc0008000000 start value shown in the log and by your calcuation.
> > I'm not clear on what that means?:
> >
> > crash> kmem -v
> > VMAP_AREA VM_STRUCT ADDRESS RANGE
> > SIZE
> > fffffe03daefd900 fffffe03daefd880 fffffc0000c10000 - fffffc0000c30000
> > 131072
> > fffffe00bf2c3a00 fffffe00bf2c3980 fffffc0000c90000 - fffffc0000cb0000
> > 131072
> > fffffe00bf2c7400 fffffe00bf2c7380 fffffc0000d50000 - fffffc0000d70000
> > 131072
> > fffffe03dc76aa00 fffffe03dc76a980 fffffc0000d90000 - fffffc0000db0000
> > 131072
> > fffffe03dc76be00 fffffe03dc76bd80 fffffc0000dd0000 - fffffc0000ee0000
> > 1114112
> > fffffe03d90e7900 fffffe03d90e7b80 fffffc0000f40000 - fffffc0000fb0000
> > 458752
> > fffffe00bec12f00 fffffe00bec12b00 fffffc0000fe0000 - fffffc0001000000
> > 131072
> > fffffe00bf3a1100 fffffe00bf3a1080 fffffc0001020000 - fffffc0001050000
> > 196608
> > fffffe00bf3a3280 fffffe00bf3a3200 fffffc0001070000 - fffffc0001090000
> > 131072
> > fffffe03ddf7fc00 fffffe03ddf7a880 fffffc0001230000 - fffffc0001250000
> > 131072
> > fffffe03fd508900 fffffe03fd508880 fffffc0008000000 - fffffc0008020000
> > 131072
> > fffffe03fd509580 fffffe03fd508d00 fffffc0008020000 - fffffc0008040000
> > 131072
> > fffffe03fd508a00 fffffe03fd508980 fffffc0008040000 - fffffc0008070000
> > 196608
> > ... [ cut ] ...
> > fffffe03d7235000 fffffe03d7233480 fffffc0014b30000 - fffffc0014b50000
> > 131072
> > fffffe00b921e700 fffffe00b921df00 fffffc0014b70000 - fffffc0014b90000
> > 131072
> > fffffe00b851eb00 fffffe00b8512900 fffffdfedfcf0000 - fffffdfedfe70000
> > 1572864
> > fffffe03dca4ef00 fffffe03dca4ef80 fffffdfedfe70000 - fffffdfedfff0000
> > 1572864
> > crash>
> >
> > Although the end value of fffffdfedfff0000 does match up with the value in
> > the
> > kernel log, your calculation is 2GB higher at fffffdff5ffeffff:
> >
> > crash> eval fffffdff5ffeffff - 0xfffffdfedfff0000
> > hexadecimal: 7fffffff
> > decimal: 2147483647
> > octal: 17777777777
> > binary:
> > 0000000000000000000000000000000001111111111111111111111111111111
> > crash> eval 7fffffff + 1
> > hexadecimal: 80000000 (2GB)
> > decimal: 2147483648
> > octal: 20000000000
> > binary:
> > 0000000000000000000000000000000010000000000000000000000000000000
> > crash>
> >
> > With respect to the vmemmap range, vmmemap address in the log file is the
> > one
> > that is displayed by "kmem -p":
> >
> > crash> kmem -p
> > PAGE PHYSICAL MAPPING INDEX CNT FLAGS
> > fffffdfee0000000 4000000000 fffffe03fd441950 2f 1 1002c
> > referenced,uptodate,lru,mappedtodisk
> > fffffdfee0000040 4000010000 fffffe03fd441950 30 1 1002c
> > referenced,uptodate,lru,mappedtodisk
> > fffffdfee0000080 4000020000 fffffe03fd441950 31 1 1002c
> > referenced,uptodate,lru,mappedtodisk
> > fffffdfee00000c0 4000030000 fffffe03fd441950 32 1 1002c
> > referenced,uptodate,lru,mappedtodisk
> > ...
> > fffffdfee0ffff00 43fffc0000 0 0 1
> > 4000000000000400 reserved
> > fffffdfee0ffff40 43fffd0000 0 0 1
> > 4000000000000400 reserved
> > fffffdfee0ffff80 43fffe0000 0 0 1
> > 4000000000000400 reserved
> > fffffdfee0ffffc0 43ffff0000 0 0 1
> > 4000000000000400 reserved
> > crash>
> >
> > And "kmem" alone recognizes it:
> >
> > crash> kmem fffffdfee0000000
> > PAGE PHYSICAL MAPPING INDEX CNT FLAGS
> > fffffdfee0000000 4000000000 fffffe03fd441950 2f 1 1002c
> > referenced,uptodate,lru,mappedtodisk
> > crash>
> >
> > But your calculated vmemmap_vaddr above doesn't seem right:
> >
> > crash> kmem fffffdff80000000
> > kmem: WARNING: cannot make virtual-to-physical translation:
> > fffffdff80000000
> > fffffdff80000000: kernel virtual address not found in mem map
> > crash>
> >
> > Do your numbers match up on a dump that actually has CONFIG_RANDOMIZE_BASE?
>
> In my environment, the numbers above match up on a dump whether
> CONFIG_RANDOMIZE_BASE is enabled or not.
>
> Are you using a mainline kernel (final v4.6, not -rcX)?
Good point. When I configured my arm64 test system, I installed the latest
Fedora kernel available at the time (4.6.0-0.rc7.git2.1.fc25), but it is based
upon linux-4.5.tar.xz. I see now that there is a kernel-4.6.0-1.fc25 available
that is based upon linux-4.6.tar.xz. I'll update the system.
> > Anyway, onto the patch...
> >
> > >
> > > changes in v2:
> > > * Fixed build warnings
> > > * Moved ARM64_NEW_VMEMMAP to machdep->flags
> > > * Show additional kaslr-related parameters in arm64_dump_machdep_table()
> > > * Handle a VMCOREINFO, "NUMBER(kimage_voffset)"
> > >
> > > ===8<===
> > > >From 080a54ec232ac48ef2d8123cbedcf0f1fe27e391 Mon Sep 17 00:00:00
2001
> > > From: AKASHI Takahiro <takahiro.akashi(a)linaro.org>
> > > Date: Mon, 16 May 2016 17:31:55 +0900
> > > Subject: [PATCH v2] arm64: fix kernel memory map handling for
> > > kaslr-enabled
> > > kernel
> > >
> > > In kernel v4.6, Kernel ASLR (KASLR) is supported on arm64, and the start
> > > address of the kernel image can be randomized if CONFIG_RANDOMIZE_BASE is
> > > enabled.
> > > Even worse, the kernel image is no more mapped in the linear mapping, but
> > > in vmalloc area (i.e. below PAGE_OFFSET).
> > >
> > > Now, according to the kernel's memory.h, converting a virtual address
to
> > > a physical address should be done like below:
> > >
> > > phys_addr_t __x = (phys_addr_t)(x); \
> > > __x & BIT(VA_BITS - 1) ? (__x & ~PAGE_OFFSET) + PHYS_OFFSET :
\
> > > (__x - kimage_voffset); })
> > >
> > > Please note that PHYS_OFFSET is no more equal to the start address of
> > > the first usable memory block in SYSTEM RAM due to the fact mentioned
> > > above.
> > >
> > > This patch addresses this change and allows the crash utility to access
> > > memory contents with correct addresses.
> > > *However*, it is still rough-edged and we need more refinements.
> > >
> > > 1-1) On a live system, crash with this patch won't work because
> > > we currently have no way to know kimage_voffset.
> > >
> > > 1-2) For a core dump file, we can do simply:
> > > $ crash <vmlinux> <vmcore>
> > > as long as the file has "NUMBER(kimage_voffset)"
> > > (RELOC_AUTO|KASLR is automatically set.)
> > >
> > > I'm planning to add this enhancement in my next version of
kexec/kdump
> > > patch, i.e. v17.
> > >
> > > 2) My current change breaks the backward compatibility.
> > > Crash won't work with v4.5 or early kernel partly because I
changed
> > > the calculation of VA_BITS. (I think that the existing code is wrong.)
> > > So far I don't have enough time to look into this issue as I'm
> > > focusing
> > > on KASLR support.
> >
> > OK, but you still haven't shown how the current VA_BITS determination is
> > not a correct representation of CONFIG_ARM64_VA_BITS.
>
> At least, the current VA_BITS doesn't work with my calculations, which
> were derived directly from v4.6's "asm/memory.h".
What are your kernel's PAGE_OFFSET value and CONFIG_ARM64_VA_BITS value?
>
> > > 3) Backtracing a 'panic'ed task fails:
> > > crash> bt
> > > PID: 999 TASK: ffffffe960081800 CPU: 0 COMMAND: "sh"
> > > bt: WARNING: corrupt prstatus? pstate=0x80000000, but no user frame
> > > found
> > > bt: WARNING: cannot determine starting stack frame for task
> > > ffffffe960081800
> > >
> > > frame->pc indicates that it is a kernel address, so obviously
something
> > > is wrong. I'm diggin into it.
> >
> > Can you add a debug statement that displays frame->pc, frame->sp, and
> > frame->fp?
>
> I've already identified the cause.
> pt_regs->pstate was set mistakenly from "sprsr_el1" in panic().
> But I have a difficulty here to fix the problem as we have no direct way
> to retrieve a value of the *current* PSTATE.
Interesting. I don't know what you mean by "sprsr_el1" in panic(), but
I just got a report from my github "issues", where the user injected a panic()
call into a kernel module and got the same error as above because of the pstate
value. I asked him to put this hack into arm64.c:
spspr_el1 is a register that holds a value of PSTATE when an exception
is trapped into EL1. Since panic() is called even in case of software
bugs, it is not useful anyway.
Currently I'm trying to save a "faked" current PSTATE into
pt_regs->pstate
and will add this fix in my next kdump patch series (v17).
--- a/arm64.c
+++ b/arm64.c
@@ -1711,11 +1711,14 @@ arm64_get_dumpfile_stackframe(struct bt_info *bt, struct
arm64_stackframe *frame
error(WARNING,
"corrupt prstatus? pstate=0x%lx, but no user frame
found\n",
ptregs->pstate);
+ if (INSTACK(frame->sp, bt) && INSTACK(frame->fp,
bt))
+ goto try_kernel;
bt->flags |= BT_REGS_NOT_FOUND;
return FALSE;
}
bt->flags |= BT_USER_SPACE;
} else {
+try_kernel:
frame->sp = ptregs->sp;
frame->fp = ptregs->regs[29];
}
Can you give that a shot?
Well, I think I've already fixed...
> >
> > > Signed-off-by: AKASHI Takahiro <takahiro.akashi(a)linaro.org>
> > > ---
> > > arm64.c | 217
> > > ++++++++++++++++++++++++++++++++++++++++++++++++--------------
> > > defs.h | 24 +++++--
> > > main.c | 7 +-
> > > symbols.c | 10 +--
> > > 4 files changed, 196 insertions(+), 62 deletions(-)
> > >
> > > diff --git a/arm64.c b/arm64.c
> > > index 34c8c59..6b97093 100644
> > > --- a/arm64.c
> > > +++ b/arm64.c
> > > @@ -72,6 +72,21 @@ static int arm64_get_crash_notes(void);
> > > static void arm64_calc_VA_BITS(void);
> > > static int arm64_is_uvaddr(ulong, struct task_context *);
> > >
> > > +ulong
> > > +arm64_VTOP(ulong addr)
> > > +{
> > > + if (!(machdep->flags & NEW_VMEMMAP) ||
> > > + (addr >= machdep->machspec->page_offset)) {
> > > + return machdep->machspec->phys_offset
> > > + + (addr - machdep->machspec->page_offset);
> > > + } else {
> > > + if (machdep->machspec->kimage_voffset)
> > > + return addr - machdep->machspec->kimage_voffset;
> > > + else /* no randomness */
> > > + return machdep->machspec->phys_offset
> > > + + (addr - machdep->machspec->vmalloc_start_addr);
> > > + }
> > > +}
> > >
> > > /*
> > > * Do all necessary machine-specific setup here. This is called several
> > > times
> > > @@ -81,6 +96,7 @@ void
> > > arm64_init(int when)
> > > {
> > > ulong value;
> > > + char *string;
> > > struct machine_specific *ms;
> > >
> > > #if defined(__x86_64__)
> > > @@ -102,9 +118,34 @@ arm64_init(int when)
> > > if (machdep->cmdline_args[0])
> > > arm64_parse_cmdline_args();
> > > machdep->flags |= MACHDEP_BT_TEXT;
> > > +
> > > + ms = machdep->machspec;
> > > + if (!ms->kimage_voffset &&
> > > + (string =
pc->read_vmcoreinfo("NUMBER(kimage_voffset)"))) {
> > > + ms->kimage_voffset = htol(string, QUIET, NULL);
> > > + free(string);
> > > + }
> >
> > It doesn't make sense to call pc->read_vmcoreinfo() if !DUMPFILE().
>
> Do you think so?
> By default, pc->read_vmcoreinfo is set to no_vmcoreinfo.
> So it doesn't hurt anything.
Right, just a suggestion...
So keep it there for now.
> > > +
> > > + if (ms->kimage_voffset) {
> > > + machdep->flags |= NEW_VMEMMAP;
> > > +
> > > + /* if not specified in command line */
> > > + if (!kt->relocate && !(kt->flags2 &
(RELOC_AUTO|KASLR)))
> > > + kt->flags2 |= (RELOC_AUTO|KASLR);
> > > + }
> > > +
> >
> > If CONFIG_RANDOMIZE_BASE is NOT set, should RELOC_AUTO|KASLR be set? They
> > don't get set on my live system.
>
> Not necessarily, but the current implementation of derive_kaslr_offset()
> just works even in this case.
Ahah -- cool!
kt->relocate will be set to 0 in this case, we will see no difference
between KASLR and non-KASLR case.
> (This is the reason that I don't think we need a VMCOREINFO
for
> CONFIG_RANDOMIZE_BASE.)
Well, as long as there's a way to avoid that unnecessary/confusing
warning message for non-KASLR new-vmemmap kernels.
I also wonder whether makedumpfile could use it?
-> Pratyush?
> > > break;
> > >
> > > case PRE_GDB:
> > > + /* This check is somewhat redundant */
> > > + if (kernel_symbol_exists("kimage_voffset")) {
> > > + machdep->flags |= NEW_VMEMMAP;
> > > +
> > > + if (!machdep->machspec->kimage_voffset)
> > > + error(WARNING,
> > > + "kimage_voffset not identified for kaslr kernel\n");
> > > + }
> > > +
> >
> > As mentioned in the general discussion above, the WARNING above makes
> > no sense if CONFIG_RANDOMIZE_BASE is not configured.
>
> What about if I change the message "for v4.6 or later kernel"?
But again, I do not want the message at all if it's not applicable.
I removed this message.
Worse case, you could set a flag, and then if/when the crash session
ultimately craps out if it can't figure things out, then the message
could be displayed post-mortem.
> > I'm now thinking that CONFIG_RANDOMIZE_BASE (and maybe others like KASAN?)
> > should be passed in the dumpfile with VMCOREINFO_CONFIG().
>
> Regarding KASAN, we only need to check for an existence of "kasan_init"
> symbol as you recommended before :)
Nice.
> >
> > > if (!machdep->pagesize) {
> > > /*
> > > * Kerneldoc Documentation/arm64/booting.txt describes
> > > @@ -160,16 +201,35 @@ arm64_init(int when)
> > > machdep->pagemask = ~((ulonglong)machdep->pageoffset);
> > >
> > > arm64_calc_VA_BITS();
> > > - machdep->machspec->page_offset = ARM64_PAGE_OFFSET;
> > > + ms = machdep->machspec;
> > > + ms->page_offset = ARM64_PAGE_OFFSET;
> > > + /* FIXME: idmap for NEW_VMEMMAP */
> > > machdep->identity_map_base = ARM64_PAGE_OFFSET;
> > > - machdep->machspec->userspace_top = ARM64_USERSPACE_TOP;
> > > - machdep->machspec->modules_vaddr = ARM64_MODULES_VADDR;
> > > - machdep->machspec->modules_end = ARM64_MODULES_END;
> > > - machdep->machspec->vmalloc_start_addr = ARM64_VMALLOC_START;
> > > - machdep->machspec->vmalloc_end = ARM64_VMALLOC_END;
> > > - machdep->kvbase = ARM64_VMALLOC_START;
> > > - machdep->machspec->vmemmap_vaddr = ARM64_VMEMMAP_VADDR;
> > > - machdep->machspec->vmemmap_end = ARM64_VMEMMAP_END;
> > > + machdep->kvbase = ARM64_VA_START;
> > > + ms->userspace_top = ARM64_USERSPACE_TOP;
> > > + if (machdep->flags & NEW_VMEMMAP) {
> > > + struct syment *sp;
> > > +
> > > + sp = kernel_symbol_search("_text");
> > > + ms->kimage_text = (sp ? sp->value : 0);
> > > + sp = kernel_symbol_search("_end");
> > > + ms->kimage_end = (sp ? sp->value : 0);
> > > +
> > > + ms->modules_vaddr = ARM64_VA_START;
> > > + if (kernel_symbol_exists("kasan_init"))
> > > + ms->modules_vaddr += ARM64_KASAN_SHADOW_SIZE;
> > > + ms->modules_end = ms->modules_vaddr
> > > + + ARM64_MODULES_VSIZE -1;
> > > +
> > > + ms->vmalloc_start_addr = ms->modules_end + 1;
> > > + } else {
> > > + ms->modules_vaddr = ARM64_PAGE_OFFSET - MEGABYTES(64);
> > > + ms->modules_end = ARM64_PAGE_OFFSET - 1;
> > > + ms->vmalloc_start_addr = ARM64_VA_START;
> > > + }
> > > + ms->vmalloc_end = ARM64_VMALLOC_END;
> > > + ms->vmemmap_vaddr = ARM64_VMEMMAP_VADDR;
> > > + ms->vmemmap_end = ARM64_VMEMMAP_END;
> > >
> > > switch (machdep->pagesize)
> > > {
> > > @@ -232,8 +292,6 @@ arm64_init(int when)
> > > machdep->stacksize = ARM64_STACK_SIZE;
> > > machdep->flags |= VMEMMAP;
> > >
> > > - arm64_calc_phys_offset();
> > > -
> > > machdep->uvtop = arm64_uvtop;
> > > machdep->kvtop = arm64_kvtop;
> > > machdep->is_kvaddr = generic_is_kvaddr;
> > > @@ -262,6 +320,10 @@ arm64_init(int when)
> > > machdep->dumpfile_init = NULL;
> > > machdep->verify_line_number = NULL;
> > > machdep->init_kernel_pgd = arm64_init_kernel_pgd;
> > > +
> > > + /* use machdep parameters */
> > > + arm64_calc_phys_offset();
> > > +
> >
> > This doesn't make any sense to me. I understand you've done it
because
> > you are now doing a readmem() in arm64_calc_phys_offset(). But readmem()
> > calls are never done this early. And in fact, on the live system the
> > readmem() in arm64_calc_phys_offset() fails like so (running "crash
-d4"):
> >
> > <readmem: fffffc0008d70bb0, KVADDR, "memstart_addr", 8,
(ROE|Q),
> > 3ffcc2842c0>
> > <read_memory_device: addr: fffffc0008d70bb0 paddr: d70bb0 cnt: 8>
> > crash: read error: kernel virtual address: fffffc0008d70bb0 type:
> > "memstart_addr"
> >
> > It fails above because it needs the phys_base in order to do the readmem().
> > During
> > runtime it has it:
>
> This is because we don't know kimage_voffset on a live system.
> (and so I mentioned this issue in my commit log.)
Right -- which is a good reason for NOT using readmem() at that point
in time. I can also envision a scenario where readmem() may inadvertently
"work" (i.e. depending upon where phys_base really is), and then return
something bogus from an unintended location.
> As far as kdump case is concerned, readmem() at this point just works
> for any address in kernel image. This is my intentional use of this function.
I understand, but you're subverting the subsequent dumpfile-specific
calls later in the function. It is precisely *their* responsibility
to return the value of phys_offset.
I think that the code should precede the other cases, including
diskdump and kdump. The only exception would be a live sysmem with
CONFIG_RANDOMIZE_RAM configured.
I will temporarily add a check here like:
arm64_calc_phys_offset()
{
if (machdep->flags & NEW_VMEMMAP)
sp = kernel_symbol_search("memstart_addr");
readmem(sp->value, ...);
...
}
This way, even on a live system with no KASLR, "phys_offset" will be
determined based on /proc/iomem, and eventually the crash should work.
(Sorry, I haven't tested it yet though.)
> > crash> set debug 4
> > debug: 4
> > crash> rd memstart_addr
> > <addr: fffffc0008d70bb0 count: 1 flag: 490 (KVADDR)>
> > <readmem: fffffc0008d70bb0, KVADDR, "64-bit KVADDR", 8, (FOE),
> > 3ffcbfbfc28>
> > <read_memory_device: addr: fffffc0008d70bb0 paddr: 4000d70bb0 cnt: 8>
> > fffffc0008d70bb0: 0000004000000000 ....@...
> > crash>
> >
> > So without kimage_voffset, that readmem(), how would that readmem()
> > possibly work?
> > You need the *contents* of memstart_addr in order to *read* memstart_addr.
> >
> > Now, if it kimage_voffset were passed into the vmcore, then I guess you
> > could
> > read it. But why do it that way? That's what diskdump_get_phys_base()
and
> > kdump_get_phys_base() are supposed to do.
>
> Because I believe that this is the *common* place to do it either for
> diskdump, or kdump ( or even a live system if we know kimage_voffset.)
Well, as I have shown, it won't work on my live system because of the
chicken-and-egg scenario. Let's worry about live system analysis after
we get kdump straightened out.
See above.
> >
> > > break;
> > >
> > > case POST_GDB:
> > > @@ -409,6 +471,8 @@ arm64_dump_machdep_table(ulong arg)
> > > fprintf(fp, "%sIRQ_STACKS", others++ ? "|" :
"");
> > > if (machdep->flags & MACHDEP_BT_TEXT)
> > > fprintf(fp, "%sMACHDEP_BT_TEXT", others++ ? "|" :
"");
> > > + if (machdep->flags & NEW_VMEMMAP)
> > > + fprintf(fp, "%sNEW_VMEMMAP", others++ ? "|" :
"");
> > > fprintf(fp, ")\n");
> > >
> > > fprintf(fp, " kvbase: %lx\n",
machdep->kvbase);
> > > @@ -494,6 +558,8 @@ arm64_dump_machdep_table(ulong arg)
> > > ms = machdep->machspec;
> > >
> > > fprintf(fp, " machspec: %lx\n", (ulong)ms);
> > > + fprintf(fp, " memory map layout: %s\n",
> > > + (machdep->flags & NEW_VMEMMAP) ? "new" :
"old");
> >
> > As before, these internal "help" structure dumps just show existing
> > structure fields,
> > and NEW_VMMEMAP is already shown in the flags.
>
> I put it as useful information for those people who might have got
> any concerns about those odd numbers/kernel layout.
> But if you don't like it, I will remove it.
I removed it.
> >
> > > fprintf(fp, " VA_BITS: %ld\n", ms->VA_BITS);
> > > fprintf(fp, " userspace_top: %016lx\n",
ms->userspace_top);
> > > fprintf(fp, " page_offset: %016lx\n",
ms->page_offset);
> > > @@ -503,6 +569,11 @@ arm64_dump_machdep_table(ulong arg)
> > > fprintf(fp, " modules_end: %016lx\n",
ms->modules_end);
> > > fprintf(fp, " vmemmap_vaddr: %016lx\n",
ms->vmemmap_vaddr);
> > > fprintf(fp, " vmemmap_end: %016lx\n",
ms->vmemmap_end);
> > > + if (machdep->flags & NEW_VMEMMAP) {
> > > + fprintf(fp, " kimage_text: %016lx\n",
ms->kimage_text);
> > > + fprintf(fp, " kimage_end: %016lx\n",
ms->kimage_end);
> > > + fprintf(fp, " kimage_voffset: %016lx\n",
ms->kimage_voffset);
> > > + }
> > > fprintf(fp, " phys_offset: %lx\n",
ms->phys_offset);
> > > fprintf(fp, "__exception_text_start: %lx\n",
> > > ms->__exception_text_start);
> > > fprintf(fp, " __exception_text_end: %lx\n",
ms->__exception_text_end);
> > > @@ -543,6 +614,42 @@ arm64_dump_machdep_table(ulong arg)
> > > }
> > > }
> > >
> > > +static int
> > > +arm64_parse_machdep_arg_l(char *argstring, char *param, ulong *value)
> > > +{
> > > + int len;
> > > + int megabytes = FALSE;
> > > + char *p;
> > > +
> > > + len = strlen(param);
> > > + if (!STRNEQ(argstring, param) || (argstring[len] != '='))
> > > + return FALSE;
> > > +
> > > + if ((LASTCHAR(argstring) == 'm') ||
> > > + (LASTCHAR(argstring) == 'M')) {
> > > + LASTCHAR(argstring) = NULLCHAR;
> > > + megabytes = TRUE;
> > > + }
> > > +
> > > + p = argstring + len + 1;
> > > + if (strlen(p)) {
> > > + int flags = RETURN_ON_ERROR | QUIET;
> > > + int err = 0;
> > > +
> > > + if (megabytes) {
> > > + *value = dtol(p, flags, &err);
> > > + if (!err)
> > > + *value = MEGABYTES(*value);
> > > + } else {
> > > + *value = htol(p, flags, &err);
> > > + }
> > > +
> > > + if (!err)
> > > + return TRUE;
> > > + }
> > > +
> > > + return FALSE;
> > > +}
> > >
> > > /*
> > > * Parse machine dependent command line arguments.
> > > @@ -554,11 +661,10 @@ arm64_dump_machdep_table(ulong arg)
> > > static void
> > > arm64_parse_cmdline_args(void)
> > > {
> > > - int index, i, c, err;
> > > + int index, i, c;
> > > char *arglist[MAXARGS];
> > > char buf[BUFSIZE];
> > > char *p;
> > > - ulong value = 0;
> > >
> > > for (index = 0; index < MAX_MACHDEP_ARGS; index++) {
> > > if (!machdep->cmdline_args[index])
> > > @@ -580,39 +686,23 @@ arm64_parse_cmdline_args(void)
> > > c = parse_line(buf, arglist);
> > >
> > > for (i = 0; i < c; i++) {
> > > - err = 0;
> > > -
> > > - if (STRNEQ(arglist[i], "phys_offset=")) {
> > > - int megabytes = FALSE;
> > > - int flags = RETURN_ON_ERROR | QUIET;
> > > -
> > > - if ((LASTCHAR(arglist[i]) == 'm') ||
> > > - (LASTCHAR(arglist[i]) == 'M')) {
> > > - LASTCHAR(arglist[i]) = NULLCHAR;
> > > - megabytes = TRUE;
> > > - }
> > > -
> > > - p = arglist[i] + strlen("phys_offset=");
> > > - if (strlen(p)) {
> > > - if (megabytes)
> > > - value = dtol(p, flags, &err);
> > > - else
> > > - value = htol(p, flags, &err);
> > > - }
> > > -
> > > - if (!err) {
> > > - if (megabytes)
> > > - value = MEGABYTES(value);
> > > -
> > > - machdep->machspec->phys_offset = value;
> > > -
> > > - error(NOTE,
> > > - "setting phys_offset to: 0x%lx\n\n",
> > > - machdep->machspec->phys_offset);
> > > + if (arm64_parse_machdep_arg_l(arglist[i],
> > > + "phys_offset",
> > > + &machdep->machspec->phys_offset)) {
> > > + error(NOTE,
> > > + "setting phys_offset to: 0x%lx\n\n",
> > > + machdep->machspec->phys_offset);
> > > +
> > > + machdep->flags |= PHYS_OFFSET;
> > > + continue;
> > > + } else if (arm64_parse_machdep_arg_l(arglist[i],
> > > + "kimage_voffset",
> > > + &machdep->machspec->kimage_voffset)) {
> > > + error(NOTE,
> > > + "setting kimage_voffset to: 0x%lx\n\n",
> > > + machdep->machspec->kimage_voffset);
> > >
> > > - machdep->flags |= PHYS_OFFSET;
> > > - continue;
> > > - }
> > > + continue;
> > > }
> > >
> > > error(WARNING, "ignoring --machdep option: %s\n",
> > > @@ -627,10 +717,20 @@ arm64_calc_phys_offset(void)
> > > {
> > > struct machine_specific *ms = machdep->machspec;
> > > ulong phys_offset;
> > > + struct syment *sp;
> > > + ulong value;
> > >
> > > if (machdep->flags & PHYS_OFFSET) /* --machdep override */
> > > return;
> > >
> > > + sp = kernel_symbol_search("memstart_addr");
> > > + if (sp && readmem(sp->value, KVADDR, (char *)&value,
sizeof(value),
> > > + "memstart_addr", QUIET|RETURN_ON_ERROR)) {
> > > + ms->phys_offset = value;
> > > + machdep->flags |= PHYS_OFFSET;
> > > + return;
> > > + }
> > > +
> >
> > Also, I may be missing something, but the PHYS_OFFSET flag means that it
> > was
> > passed in via --machdep, and is only used in this function to avoid any
> > further
> > attempts to figure out what it is. Here you are setting the flag for no
> > particular
> > reason that I can see. Why?
>
> Well, I don't remember very well. If not necessary, I will remove it.
OK, I guarantee it's useless in this case...
I removed it.
Thanks,
-Takahiro AKASHI
> > But again, doing a readmem() call this early is setting a
bad precedent --
> > they have
> > *always* been delayed until after gdb_session_init() in main_loop(). And
> > the work
> > of determining phys_base should be done by the dumpfile call-out functions
> > that
> > already exist.
> >
> >
> > > /*
> > > * Next determine suitable value for phys_offset. User can override
> > > this
> > > * by passing valid '--machdep phys_offset=<addr>'
option.
> > > @@ -2377,6 +2477,11 @@ arm64_IS_VMALLOC_ADDR(ulong vaddr)
> > > {
> > > struct machine_specific *ms = machdep->machspec;
> > >
> > > + if ((machdep->flags & NEW_VMEMMAP) &&
> > > + (vaddr >= machdep->machspec->kimage_text) &&
> > > + (vaddr <= machdep->machspec->kimage_end))
> > > + return FALSE;
> > > +
> > > return ((vaddr >= ms->vmalloc_start_addr && vaddr
<=
> > > ms->vmalloc_end) ||
> > > ((machdep->flags & VMEMMAP) &&
> > > (vaddr >= ms->vmemmap_vaddr && vaddr
<=
> > > ms->vmemmap_end))
> > > ||
> > > @@ -2407,7 +2512,11 @@ arm64_calc_VA_BITS(void)
> > >
> > > for (bitval = highest_bit_long(value); bitval; bitval--) {
> > > if ((value & (1UL << bitval)) == 0) {
> > > +#if 1
> > > + machdep->machspec->VA_BITS = bitval + 1;
> > > +#else /* FIXME: bug? */
> > > machdep->machspec->VA_BITS = bitval + 2;
> > > +#endif
> > > break;
> > > }
> > > }
> >
> > Again, a non-starter until the above is resolved -- but you know that
> > already...
>
> Just a reminder for now.
>
> >
> > > @@ -2459,10 +2568,22 @@ arm64_calc_virtual_memory_ranges(void)
> > > break;
> > > }
> > >
> > > - vmemmap_size = ALIGN((1UL << (ms->VA_BITS -
machdep->pageshift)) *
> > > SIZE(page), PUD_SIZE);
> > > + if (machdep->flags & NEW_VMEMMAP)
> > > +#define STRUCT_PAGE_MAX_SHIFT 6
> > > + vmemmap_size = 1UL << (ms->VA_BITS - machdep->pageshift -
1
> > > + + STRUCT_PAGE_MAX_SHIFT);
> > > + else
> > > + vmemmap_size = ALIGN((1UL << (ms->VA_BITS -
machdep->pageshift)) *
> > > SIZE(page), PUD_SIZE);
> > > +
> > > vmalloc_end = (ms->page_offset - PUD_SIZE - vmemmap_size - SZ_64K);
> > > - vmemmap_start = vmalloc_end + SZ_64K;
> > > - vmemmap_end = vmemmap_start + vmemmap_size;
> > > +
> > > + if (machdep->flags & NEW_VMEMMAP) {
> > > + vmemmap_start = ms->page_offset - vmemmap_size;
> > > + vmemmap_end = ms->page_offset;
> > > + } else {
> > > + vmemmap_start = vmalloc_end + SZ_64K;
> > > + vmemmap_end = vmemmap_start + vmemmap_size;
> > > + }
> > >
> > > ms->vmalloc_end = vmalloc_end - 1;
> > > ms->vmemmap_vaddr = vmemmap_start;
> > > diff --git a/defs.h b/defs.h
> > > index a09fa9a..2bbb55b 100644
> > > --- a/defs.h
> > > +++ b/defs.h
> > > @@ -2844,8 +2844,8 @@ typedef u64 pte_t;
> > >
> > > #define PTOV(X) \
> > > ((unsigned
> > >
long)(X)-(machdep->machspec->phys_offset)+(machdep->machspec->page_offset))
> > > -#define VTOP(X) \
> > > - ((unsigned
> > >
long)(X)-(machdep->machspec->page_offset)+(machdep->machspec->phys_offset))
> > > +
> > > +#define VTOP(X) arm64_VTOP((ulong)(X))
> > >
> > > #define USERSPACE_TOP (machdep->machspec->userspace_top)
> > > #define PAGE_OFFSET (machdep->machspec->page_offset)
> > > @@ -2938,18 +2938,23 @@ typedef signed int s32;
> > > #define VM_L3_4K (0x10)
> > > #define KDUMP_ENABLED (0x20)
> > > #define IRQ_STACKS (0x40)
> > > +#define NEW_VMEMMAP (0x80)
> > >
> > > /*
> > > * sources: Documentation/arm64/memory.txt
> > > * arch/arm64/include/asm/memory.h
> > > * arch/arm64/include/asm/pgtable.h
> > > */
> > > -
> > > -#define ARM64_PAGE_OFFSET ((0xffffffffffffffffUL) <<
> > > (machdep->machspec->VA_BITS - 1))
> > > +#define ARM64_VA_START ((0xffffffffffffffffUL) \
> > > + << machdep->machspec->VA_BITS)
> > > +#define ARM64_PAGE_OFFSET ((0xffffffffffffffffUL) \
> > > + << (machdep->machspec->VA_BITS - 1))
> > > #define ARM64_USERSPACE_TOP ((1UL) <<
machdep->machspec->VA_BITS)
> > > -#define ARM64_MODULES_VADDR (ARM64_PAGE_OFFSET - MEGABYTES(64))
> > > -#define ARM64_MODULES_END (ARM64_PAGE_OFFSET - 1)
> > > -#define ARM64_VMALLOC_START ((0xffffffffffffffffUL) <<
> > > machdep->machspec->VA_BITS)
> > > +
> > > +/* only used for v4.6 or later */
> > > +#define ARM64_MODULES_VSIZE MEGABYTES(128)
> > > +#define ARM64_KASAN_SHADOW_SIZE (1UL <<
(machdep->machspec->VA_BITS -
> > > 3))
> > > +
> > > /*
> > > * The following 3 definitions are the original values, but are obsolete
> > > * for 3.17 and later kernels because they are now build-time
> > > calculations.
> > > @@ -3028,6 +3033,10 @@ struct machine_specific {
> > > ulong kernel_flags;
> > > ulong irq_stack_size;
> > > ulong *irq_stacks;
> > > + /* only needed for kaslr-enabled kernel */
> > > + ulong kimage_voffset;
> > > + ulong kimage_text;
> > > + ulong kimage_end;
> > > };
> >
> > The comment is wrong -- the kimage_text and kimage_end values are needed
> > for
> > NEW_VMEMMAP kernels that are NOT configured with CONFIG_RANDOMIZE_BASE.
> > Unless you consider my live system "kaslr-enabled"?
>
> I will change it to "for v4.6 or later".
>
> I have limited (or no) time to investigate a live-system case, and so if
> you have any suggestions or your own patch, it is very much appreciated.
Yep... but I will build upon your solid foundation.
If I see anything interesting/different after my live system upgrade, I'll let
you know.
Thanks,
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
--
Thanks,
-Takahiro AKASHI