----- Original Message -----
On Tue, May 24, 2016 at 01:59:06PM -0400, Dave Anderson wrote:
>
> ----- Original Message -----
> > Yet some issues, but ...
> >
>
> Hi Takahiro,
>
> Here are my general comments on my testing of the v2 patch, followed
> by a few comments in the patch itself.
Thank you for your quick review/testing, again.
> First, the combination of the new memory map layout and KASLR is somewhat
> confusing. I am testing your patch on a 4.6.0-0.rc7.git2.1.fc25 kernel
> that has this configuration:
Right. I use the "kaslr kernel" or "kaslr-enabled kernel" in two
meanings (almost interchangeably):
- kernel with a new memory map, that is v4.6 or later
- kernel that has ability of KASLR (configured with CONFIG_RANDOMIZE_RAM)
When we talked offline, you mentioned the possibility of backporting
KASLR to older kernel, so I hesitated to use "v4.6 or later".
But I think that we'd better use "v4.6 or later" for a clarification
for now.
Actually I was thinking more along the lines of the new vmemmap stuff
being backported. But it looks like that would be impossible without
also bringing along "kimage_voffset".
> config-arm64:# CONFIG_RANDOMIZE_BASE is not set
>
> So KASLR doesn't really enter into the picture. But when bringing
> up the crash session, it shows the "kaslr kernel" WARNING:
>
> # ./crash
>
> crash 7.1.5++
> Copyright (C) 2002-2016 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005, 2011 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for
> details.
>
> WARNING: kimage_voffset not identified for kaslr kernel
> GNU gdb (GDB) 7.6
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <
http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show
> copying"
> and "show warranty" for details.
> This GDB was configured as "aarch64-unknown-linux-gnu"...
>
> KERNEL:
> /usr/lib/debug/lib/modules/4.6.0-0.rc7.git2.1.fc25.aarch64/vmlinux
> DUMPFILE: /dev/crash
> CPUS: 8
> DATE: Tue May 24 10:08:08 2016
> UPTIME: 11 days, 18:32:41
> LOAD AVERAGE: 0.17, 0.09, 0.12
> TASKS: 197
> NODENAME:
apm-mustang-ev3-36.khw.lab.eng.bos.redhat.com
> RELEASE: 4.6.0-0.rc7.git2.1.fc25.aarch64
> VERSION: #1 SMP Thu May 12 13:28:43 UTC 2016
> MACHINE: aarch64 (unknown Mhz)
> MEMORY: 16 GB
> PID: 7556
> COMMAND: "crash"
> TASK: fffffe00beb45400 [THREAD_INFO: fffffe00beb98000]
> CPU: 7
> STATE: TASK_RUNNING (ACTIVE)
>
> crash>
>
> Why show that WARNING in this case? I understant that it's not
> stashed during early initialization:
If we don't know kimage_voffset, we can't analyze the contents of
memory (live or kdump).
I understand. But clearly that's not the case with my system above that
does not have CONFIG_RANDOMIZE_BASE configured, so the warning message
is nonsensical/confusing. That's why I was speculating about perhaps
having kdump export a VMCOREINFO_CONFIG(RANDOMIZE_BASE) so that it is
readily apparent.
> crash> help -m
> flags: 104000c5
>
(KSYMS_START|VM_L2_64K|VMEMMAP|IRQ_STACKS|MACHDEP_BT_TEXT|NEW_VMEMMAP)
> ... [ cut ] ...
> memory map layout: new <--- BTW, this is
> redundant/not-a-member, and we've got NEW_VMEMMAP
> VA_BITS: 42
> userspace_top: 0000040000000000
> page_offset: fffffe0000000000
> vmalloc_start_addr: fffffc0008000000
> vmalloc_end: fffffdff5ffeffff
> modules_vaddr: fffffc0000000000
> modules_end: fffffc0007ffffff
> vmemmap_vaddr: fffffdff80000000
> vmemmap_end: fffffdffffffffff
> kimage_text: fffffc0008080000
> kimage_end: fffffc0009070000
> kimage_voffset: 0000000000000000 <-- available if kernel is
> not randomized
> phys_offset: 4000000000
> ...
>
> But it can be read:
>
> crash> px kimage_voffset
> kimage_voffset = $1 = 0xfffffbc008000000
> crash>
>
> SO because it wasn't determined during session initialization,
> it falls into the "no randomness" section of arm64_VTOP():
Exactly.
> ulong
> arm64_VTOP(ulong addr)
> {
> if (!(machdep->flags & NEW_VMEMMAP) ||
> (addr >= machdep->machspec->page_offset)) {
> return machdep->machspec->phys_offset
> + (addr - machdep->machspec->page_offset);
> } else {
> if (machdep->machspec->kimage_voffset)
> return addr - machdep->machspec->kimage_voffset;
> else /* no randomness */
> return machdep->machspec->phys_offset
> + (addr -
> machdep->machspec->vmalloc_start_addr);
> }
> }
>
> That works, but if "kimage_offset" can be read normally later on, perhaps
it
> should update "machdep->machspec->kimage_voffset" at that time?
It is possible, but the check in arm64_VTOP() would be still necessary.
Agreed...
> There are some discrepancies with respect to the calculated addresses
> and what is seen by looking at the reported memory map in the kernel log:
>
> [ 0.000000] Virtual kernel memory layout:
> [ 0.000000] modules : 0xfffffc0000000000 - 0xfffffc0008000000 (
> 128 MB)
> [ 0.000000] vmalloc : 0xfffffc0008000000 - 0xfffffdfedfff0000 (
> 2043 GB)
> [ 0.000000] .text : 0xfffffc0008080000 - 0xfffffc0008890000 (
> 8256 KB)
> .rodata : 0xfffffc0008890000 - 0xfffffc0008c10000 (
> 3584 KB)
> .init : 0xfffffc0008c10000 - 0xfffffc0008d50000 (
> 1280 KB)
> .data : 0xfffffc0008d50000 - 0xfffffc0008eaac00 (
> 1387 KB)
> [ 0.000000] vmemmap : 0xfffffdfee0000000 - 0xfffffdffe0000000 (
> 4 GB maximum)
> 0xfffffdfee0000000 - 0xfffffdfee1000000 (
> 16 MB actual)
> [ 0.000000] fixed : 0xfffffdfffe7d0000 - 0xfffffdfffec00000 (
> 4288 KB)
> [ 0.000000] PCI I/O : 0xfffffdfffee00000 - 0xfffffdffffe00000 (
> 16 MB)
> [ 0.000000] memory : 0xfffffe0000000000 - 0xfffffe0400000000 (
> 16384 MB)
>
> Comparing it to the calculated values:
>
> VA_BITS: 42
> userspace_top: 0000040000000000
> page_offset: fffffe0000000000 OK
> vmalloc_start_addr: fffffc0008000000 OK
> vmalloc_end: fffffdff5ffeffff ? (seems wrong)
> modules_vaddr: fffffc0000000000 OK
> modules_end: fffffc0007ffffff OK (-1)
> vmemmap_vaddr: fffffdff80000000 ? (seems wrong)
> vmemmap_end: fffffdffffffffff ? (seems wrong)
> kimage_text: fffffc0008080000 OK
> kimage_end: fffffc0009070000 OK
> kimage_voffset: 0000000000000000
> phys_offset: 4000000000
>
> Starting with vmalloc_start_addr/vmalloc_end, if I run "kmem -v" to dump
> the
> vmalloc allocations, there are some allocations that are below the
> 0xfffffc0008000000 start value shown in the log and by your calcuation.
> I'm not clear on what that means?:
>
> crash> kmem -v
> VMAP_AREA VM_STRUCT ADDRESS RANGE
> SIZE
> fffffe03daefd900 fffffe03daefd880 fffffc0000c10000 - fffffc0000c30000
> 131072
> fffffe00bf2c3a00 fffffe00bf2c3980 fffffc0000c90000 - fffffc0000cb0000
> 131072
> fffffe00bf2c7400 fffffe00bf2c7380 fffffc0000d50000 - fffffc0000d70000
> 131072
> fffffe03dc76aa00 fffffe03dc76a980 fffffc0000d90000 - fffffc0000db0000
> 131072
> fffffe03dc76be00 fffffe03dc76bd80 fffffc0000dd0000 - fffffc0000ee0000
> 1114112
> fffffe03d90e7900 fffffe03d90e7b80 fffffc0000f40000 - fffffc0000fb0000
> 458752
> fffffe00bec12f00 fffffe00bec12b00 fffffc0000fe0000 - fffffc0001000000
> 131072
> fffffe00bf3a1100 fffffe00bf3a1080 fffffc0001020000 - fffffc0001050000
> 196608
> fffffe00bf3a3280 fffffe00bf3a3200 fffffc0001070000 - fffffc0001090000
> 131072
> fffffe03ddf7fc00 fffffe03ddf7a880 fffffc0001230000 - fffffc0001250000
> 131072
> fffffe03fd508900 fffffe03fd508880 fffffc0008000000 - fffffc0008020000
> 131072
> fffffe03fd509580 fffffe03fd508d00 fffffc0008020000 - fffffc0008040000
> 131072
> fffffe03fd508a00 fffffe03fd508980 fffffc0008040000 - fffffc0008070000
> 196608
> ... [ cut ] ...
> fffffe03d7235000 fffffe03d7233480 fffffc0014b30000 - fffffc0014b50000
> 131072
> fffffe00b921e700 fffffe00b921df00 fffffc0014b70000 - fffffc0014b90000
> 131072
> fffffe00b851eb00 fffffe00b8512900 fffffdfedfcf0000 - fffffdfedfe70000
> 1572864
> fffffe03dca4ef00 fffffe03dca4ef80 fffffdfedfe70000 - fffffdfedfff0000
> 1572864
> crash>
>
> Although the end value of fffffdfedfff0000 does match up with the value in
> the
> kernel log, your calculation is 2GB higher at fffffdff5ffeffff:
>
> crash> eval fffffdff5ffeffff - 0xfffffdfedfff0000
> hexadecimal: 7fffffff
> decimal: 2147483647
> octal: 17777777777
> binary:
> 0000000000000000000000000000000001111111111111111111111111111111
> crash> eval 7fffffff + 1
> hexadecimal: 80000000 (2GB)
> decimal: 2147483648
> octal: 20000000000
> binary:
> 0000000000000000000000000000000010000000000000000000000000000000
> crash>
>
> With respect to the vmemmap range, vmmemap address in the log file is the
> one
> that is displayed by "kmem -p":
>
> crash> kmem -p
> PAGE PHYSICAL MAPPING INDEX CNT FLAGS
> fffffdfee0000000 4000000000 fffffe03fd441950 2f 1 1002c
> referenced,uptodate,lru,mappedtodisk
> fffffdfee0000040 4000010000 fffffe03fd441950 30 1 1002c
> referenced,uptodate,lru,mappedtodisk
> fffffdfee0000080 4000020000 fffffe03fd441950 31 1 1002c
> referenced,uptodate,lru,mappedtodisk
> fffffdfee00000c0 4000030000 fffffe03fd441950 32 1 1002c
> referenced,uptodate,lru,mappedtodisk
> ...
> fffffdfee0ffff00 43fffc0000 0 0 1
> 4000000000000400 reserved
> fffffdfee0ffff40 43fffd0000 0 0 1
> 4000000000000400 reserved
> fffffdfee0ffff80 43fffe0000 0 0 1
> 4000000000000400 reserved
> fffffdfee0ffffc0 43ffff0000 0 0 1
> 4000000000000400 reserved
> crash>
>
> And "kmem" alone recognizes it:
>
> crash> kmem fffffdfee0000000
> PAGE PHYSICAL MAPPING INDEX CNT FLAGS
> fffffdfee0000000 4000000000 fffffe03fd441950 2f 1 1002c
> referenced,uptodate,lru,mappedtodisk
> crash>
>
> But your calculated vmemmap_vaddr above doesn't seem right:
>
> crash> kmem fffffdff80000000
> kmem: WARNING: cannot make virtual-to-physical translation:
> fffffdff80000000
> fffffdff80000000: kernel virtual address not found in mem map
> crash>
>
> Do your numbers match up on a dump that actually has CONFIG_RANDOMIZE_BASE?
In my environment, the numbers above match up on a dump whether
CONFIG_RANDOMIZE_BASE is enabled or not.
Are you using a mainline kernel (final v4.6, not -rcX)?
Good point. When I configured my arm64 test system, I installed the latest
Fedora kernel available at the time (4.6.0-0.rc7.git2.1.fc25), but it is based
upon linux-4.5.tar.xz. I see now that there is a kernel-4.6.0-1.fc25 available
that is based upon linux-4.6.tar.xz. I'll update the system.
> Anyway, onto the patch...
>
> >
> > changes in v2:
> > * Fixed build warnings
> > * Moved ARM64_NEW_VMEMMAP to machdep->flags
> > * Show additional kaslr-related parameters in arm64_dump_machdep_table()
> > * Handle a VMCOREINFO, "NUMBER(kimage_voffset)"
> >
> > ===8<===
> > >From 080a54ec232ac48ef2d8123cbedcf0f1fe27e391 Mon Sep 17 00:00:00 2001
> > From: AKASHI Takahiro <takahiro.akashi(a)linaro.org>
> > Date: Mon, 16 May 2016 17:31:55 +0900
> > Subject: [PATCH v2] arm64: fix kernel memory map handling for
> > kaslr-enabled
> > kernel
> >
> > In kernel v4.6, Kernel ASLR (KASLR) is supported on arm64, and the start
> > address of the kernel image can be randomized if CONFIG_RANDOMIZE_BASE is
> > enabled.
> > Even worse, the kernel image is no more mapped in the linear mapping, but
> > in vmalloc area (i.e. below PAGE_OFFSET).
> >
> > Now, according to the kernel's memory.h, converting a virtual address to
> > a physical address should be done like below:
> >
> > phys_addr_t __x = (phys_addr_t)(x); \
> > __x & BIT(VA_BITS - 1) ? (__x & ~PAGE_OFFSET) + PHYS_OFFSET : \
> > (__x - kimage_voffset); })
> >
> > Please note that PHYS_OFFSET is no more equal to the start address of
> > the first usable memory block in SYSTEM RAM due to the fact mentioned
> > above.
> >
> > This patch addresses this change and allows the crash utility to access
> > memory contents with correct addresses.
> > *However*, it is still rough-edged and we need more refinements.
> >
> > 1-1) On a live system, crash with this patch won't work because
> > we currently have no way to know kimage_voffset.
> >
> > 1-2) For a core dump file, we can do simply:
> > $ crash <vmlinux> <vmcore>
> > as long as the file has "NUMBER(kimage_voffset)"
> > (RELOC_AUTO|KASLR is automatically set.)
> >
> > I'm planning to add this enhancement in my next version of kexec/kdump
> > patch, i.e. v17.
> >
> > 2) My current change breaks the backward compatibility.
> > Crash won't work with v4.5 or early kernel partly because I changed
> > the calculation of VA_BITS. (I think that the existing code is wrong.)
> > So far I don't have enough time to look into this issue as I'm
> > focusing
> > on KASLR support.
>
> OK, but you still haven't shown how the current VA_BITS determination is
> not a correct representation of CONFIG_ARM64_VA_BITS.
At least, the current VA_BITS doesn't work with my calculations, which
were derived directly from v4.6's "asm/memory.h".
What are your kernel's PAGE_OFFSET value and CONFIG_ARM64_VA_BITS value?
> > 3) Backtracing a 'panic'ed task fails:
> > crash> bt
> > PID: 999 TASK: ffffffe960081800 CPU: 0 COMMAND: "sh"
> > bt: WARNING: corrupt prstatus? pstate=0x80000000, but no user frame
> > found
> > bt: WARNING: cannot determine starting stack frame for task
> > ffffffe960081800
> >
> > frame->pc indicates that it is a kernel address, so obviously something
> > is wrong. I'm diggin into it.
>
> Can you add a debug statement that displays frame->pc, frame->sp, and
> frame->fp?
I've already identified the cause.
pt_regs->pstate was set mistakenly from "sprsr_el1" in panic().
But I have a difficulty here to fix the problem as we have no direct way
to retrieve a value of the *current* PSTATE.
Interesting. I don't know what you mean by "sprsr_el1" in panic(), but
I just got a report from my github "issues", where the user injected a panic()
call into a kernel module and got the same error as above because of the pstate
value. I asked him to put this hack into arm64.c:
--- a/arm64.c
+++ b/arm64.c
@@ -1711,11 +1711,14 @@ arm64_get_dumpfile_stackframe(struct bt_info *bt, struct
arm64_stackframe *frame
error(WARNING,
"corrupt prstatus? pstate=0x%lx, but no user frame
found\n",
ptregs->pstate);
+ if (INSTACK(frame->sp, bt) && INSTACK(frame->fp,
bt))
+ goto try_kernel;
bt->flags |= BT_REGS_NOT_FOUND;
return FALSE;
}
bt->flags |= BT_USER_SPACE;
} else {
+try_kernel:
frame->sp = ptregs->sp;
frame->fp = ptregs->regs[29];
}
Can you give that a shot?
>
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi(a)linaro.org>
> > ---
> > arm64.c | 217
> > ++++++++++++++++++++++++++++++++++++++++++++++++--------------
> > defs.h | 24 +++++--
> > main.c | 7 +-
> > symbols.c | 10 +--
> > 4 files changed, 196 insertions(+), 62 deletions(-)
> >
> > diff --git a/arm64.c b/arm64.c
> > index 34c8c59..6b97093 100644
> > --- a/arm64.c
> > +++ b/arm64.c
> > @@ -72,6 +72,21 @@ static int arm64_get_crash_notes(void);
> > static void arm64_calc_VA_BITS(void);
> > static int arm64_is_uvaddr(ulong, struct task_context *);
> >
> > +ulong
> > +arm64_VTOP(ulong addr)
> > +{
> > + if (!(machdep->flags & NEW_VMEMMAP) ||
> > + (addr >= machdep->machspec->page_offset)) {
> > + return machdep->machspec->phys_offset
> > + + (addr - machdep->machspec->page_offset);
> > + } else {
> > + if (machdep->machspec->kimage_voffset)
> > + return addr - machdep->machspec->kimage_voffset;
> > + else /* no randomness */
> > + return machdep->machspec->phys_offset
> > + + (addr - machdep->machspec->vmalloc_start_addr);
> > + }
> > +}
> >
> > /*
> > * Do all necessary machine-specific setup here. This is called several
> > times
> > @@ -81,6 +96,7 @@ void
> > arm64_init(int when)
> > {
> > ulong value;
> > + char *string;
> > struct machine_specific *ms;
> >
> > #if defined(__x86_64__)
> > @@ -102,9 +118,34 @@ arm64_init(int when)
> > if (machdep->cmdline_args[0])
> > arm64_parse_cmdline_args();
> > machdep->flags |= MACHDEP_BT_TEXT;
> > +
> > + ms = machdep->machspec;
> > + if (!ms->kimage_voffset &&
> > + (string = pc->read_vmcoreinfo("NUMBER(kimage_voffset)")))
{
> > + ms->kimage_voffset = htol(string, QUIET, NULL);
> > + free(string);
> > + }
>
> It doesn't make sense to call pc->read_vmcoreinfo() if !DUMPFILE().
Do you think so?
By default, pc->read_vmcoreinfo is set to no_vmcoreinfo.
So it doesn't hurt anything.
Right, just a suggestion...
> > +
> > + if (ms->kimage_voffset) {
> > + machdep->flags |= NEW_VMEMMAP;
> > +
> > + /* if not specified in command line */
> > + if (!kt->relocate && !(kt->flags2 & (RELOC_AUTO|KASLR)))
> > + kt->flags2 |= (RELOC_AUTO|KASLR);
> > + }
> > +
>
> If CONFIG_RANDOMIZE_BASE is NOT set, should RELOC_AUTO|KASLR be set? They
> don't get set on my live system.
Not necessarily, but the current implementation of derive_kaslr_offset()
just works even in this case.
Ahah -- cool!
(This is the reason that I don't think we need a VMCOREINFO for
CONFIG_RANDOMIZE_BASE.)
Well, as long as there's a way to avoid that unnecessary/confusing
warning message for non-KASLR new-vmemmap kernels.
I also wonder whether makedumpfile could use it?
> > break;
> >
> > case PRE_GDB:
> > + /* This check is somewhat redundant */
> > + if (kernel_symbol_exists("kimage_voffset")) {
> > + machdep->flags |= NEW_VMEMMAP;
> > +
> > + if (!machdep->machspec->kimage_voffset)
> > + error(WARNING,
> > + "kimage_voffset not identified for kaslr kernel\n");
> > + }
> > +
>
> As mentioned in the general discussion above, the WARNING above makes
> no sense if CONFIG_RANDOMIZE_BASE is not configured.
What about if I change the message "for v4.6 or later kernel"?
But again, I do not want the message at all if it's not applicable.
Worse case, you could set a flag, and then if/when the crash session
ultimately craps out if it can't figure things out, then the message
could be displayed post-mortem.
> I'm now thinking that CONFIG_RANDOMIZE_BASE (and maybe
others like KASAN?)
> should be passed in the dumpfile with VMCOREINFO_CONFIG().
Regarding KASAN, we only need to check for an existence of "kasan_init"
symbol as you recommended before :)
Nice.
>
> > if (!machdep->pagesize) {
> > /*
> > * Kerneldoc Documentation/arm64/booting.txt describes
> > @@ -160,16 +201,35 @@ arm64_init(int when)
> > machdep->pagemask = ~((ulonglong)machdep->pageoffset);
> >
> > arm64_calc_VA_BITS();
> > - machdep->machspec->page_offset = ARM64_PAGE_OFFSET;
> > + ms = machdep->machspec;
> > + ms->page_offset = ARM64_PAGE_OFFSET;
> > + /* FIXME: idmap for NEW_VMEMMAP */
> > machdep->identity_map_base = ARM64_PAGE_OFFSET;
> > - machdep->machspec->userspace_top = ARM64_USERSPACE_TOP;
> > - machdep->machspec->modules_vaddr = ARM64_MODULES_VADDR;
> > - machdep->machspec->modules_end = ARM64_MODULES_END;
> > - machdep->machspec->vmalloc_start_addr = ARM64_VMALLOC_START;
> > - machdep->machspec->vmalloc_end = ARM64_VMALLOC_END;
> > - machdep->kvbase = ARM64_VMALLOC_START;
> > - machdep->machspec->vmemmap_vaddr = ARM64_VMEMMAP_VADDR;
> > - machdep->machspec->vmemmap_end = ARM64_VMEMMAP_END;
> > + machdep->kvbase = ARM64_VA_START;
> > + ms->userspace_top = ARM64_USERSPACE_TOP;
> > + if (machdep->flags & NEW_VMEMMAP) {
> > + struct syment *sp;
> > +
> > + sp = kernel_symbol_search("_text");
> > + ms->kimage_text = (sp ? sp->value : 0);
> > + sp = kernel_symbol_search("_end");
> > + ms->kimage_end = (sp ? sp->value : 0);
> > +
> > + ms->modules_vaddr = ARM64_VA_START;
> > + if (kernel_symbol_exists("kasan_init"))
> > + ms->modules_vaddr += ARM64_KASAN_SHADOW_SIZE;
> > + ms->modules_end = ms->modules_vaddr
> > + + ARM64_MODULES_VSIZE -1;
> > +
> > + ms->vmalloc_start_addr = ms->modules_end + 1;
> > + } else {
> > + ms->modules_vaddr = ARM64_PAGE_OFFSET - MEGABYTES(64);
> > + ms->modules_end = ARM64_PAGE_OFFSET - 1;
> > + ms->vmalloc_start_addr = ARM64_VA_START;
> > + }
> > + ms->vmalloc_end = ARM64_VMALLOC_END;
> > + ms->vmemmap_vaddr = ARM64_VMEMMAP_VADDR;
> > + ms->vmemmap_end = ARM64_VMEMMAP_END;
> >
> > switch (machdep->pagesize)
> > {
> > @@ -232,8 +292,6 @@ arm64_init(int when)
> > machdep->stacksize = ARM64_STACK_SIZE;
> > machdep->flags |= VMEMMAP;
> >
> > - arm64_calc_phys_offset();
> > -
> > machdep->uvtop = arm64_uvtop;
> > machdep->kvtop = arm64_kvtop;
> > machdep->is_kvaddr = generic_is_kvaddr;
> > @@ -262,6 +320,10 @@ arm64_init(int when)
> > machdep->dumpfile_init = NULL;
> > machdep->verify_line_number = NULL;
> > machdep->init_kernel_pgd = arm64_init_kernel_pgd;
> > +
> > + /* use machdep parameters */
> > + arm64_calc_phys_offset();
> > +
>
> This doesn't make any sense to me. I understand you've done it because
> you are now doing a readmem() in arm64_calc_phys_offset(). But readmem()
> calls are never done this early. And in fact, on the live system the
> readmem() in arm64_calc_phys_offset() fails like so (running "crash
-d4"):
>
> <readmem: fffffc0008d70bb0, KVADDR, "memstart_addr", 8, (ROE|Q),
> 3ffcc2842c0>
> <read_memory_device: addr: fffffc0008d70bb0 paddr: d70bb0 cnt: 8>
> crash: read error: kernel virtual address: fffffc0008d70bb0 type:
> "memstart_addr"
>
> It fails above because it needs the phys_base in order to do the readmem().
> During
> runtime it has it:
This is because we don't know kimage_voffset on a live system.
(and so I mentioned this issue in my commit log.)
Right -- which is a good reason for NOT using readmem() at that point
in time. I can also envision a scenario where readmem() may inadvertently
"work" (i.e. depending upon where phys_base really is), and then return
something bogus from an unintended location.
As far as kdump case is concerned, readmem() at this point just
works
for any address in kernel image. This is my intentional use of this function.
I understand, but you're subverting the subsequent dumpfile-specific
calls later in the function. It is precisely *their* responsibility
to return the value of phys_offset.
> crash> set debug 4
> debug: 4
> crash> rd memstart_addr
> <addr: fffffc0008d70bb0 count: 1 flag: 490 (KVADDR)>
> <readmem: fffffc0008d70bb0, KVADDR, "64-bit KVADDR", 8, (FOE),
> 3ffcbfbfc28>
> <read_memory_device: addr: fffffc0008d70bb0 paddr: 4000d70bb0 cnt: 8>
> fffffc0008d70bb0: 0000004000000000 ....@...
> crash>
>
> So without kimage_voffset, that readmem(), how would that readmem()
> possibly work?
> You need the *contents* of memstart_addr in order to *read* memstart_addr.
>
> Now, if it kimage_voffset were passed into the vmcore, then I guess you
> could
> read it. But why do it that way? That's what diskdump_get_phys_base() and
> kdump_get_phys_base() are supposed to do.
Because I believe that this is the *common* place to do it either for
diskdump, or kdump ( or even a live system if we know kimage_voffset.)
Well, as I have shown, it won't work on my live system because of the
chicken-and-egg scenario. Let's worry about live system analysis after
we get kdump straightened out.
>
> > break;
> >
> > case POST_GDB:
> > @@ -409,6 +471,8 @@ arm64_dump_machdep_table(ulong arg)
> > fprintf(fp, "%sIRQ_STACKS", others++ ? "|" :
"");
> > if (machdep->flags & MACHDEP_BT_TEXT)
> > fprintf(fp, "%sMACHDEP_BT_TEXT", others++ ? "|" :
"");
> > + if (machdep->flags & NEW_VMEMMAP)
> > + fprintf(fp, "%sNEW_VMEMMAP", others++ ? "|" :
"");
> > fprintf(fp, ")\n");
> >
> > fprintf(fp, " kvbase: %lx\n", machdep->kvbase);
> > @@ -494,6 +558,8 @@ arm64_dump_machdep_table(ulong arg)
> > ms = machdep->machspec;
> >
> > fprintf(fp, " machspec: %lx\n", (ulong)ms);
> > + fprintf(fp, " memory map layout: %s\n",
> > + (machdep->flags & NEW_VMEMMAP) ? "new" :
"old");
>
> As before, these internal "help" structure dumps just show existing
> structure fields,
> and NEW_VMMEMAP is already shown in the flags.
I put it as useful information for those people who might have got
any concerns about those odd numbers/kernel layout.
But if you don't like it, I will remove it.
>
> > fprintf(fp, " VA_BITS: %ld\n", ms->VA_BITS);
> > fprintf(fp, " userspace_top: %016lx\n",
ms->userspace_top);
> > fprintf(fp, " page_offset: %016lx\n",
ms->page_offset);
> > @@ -503,6 +569,11 @@ arm64_dump_machdep_table(ulong arg)
> > fprintf(fp, " modules_end: %016lx\n",
ms->modules_end);
> > fprintf(fp, " vmemmap_vaddr: %016lx\n",
ms->vmemmap_vaddr);
> > fprintf(fp, " vmemmap_end: %016lx\n",
ms->vmemmap_end);
> > + if (machdep->flags & NEW_VMEMMAP) {
> > + fprintf(fp, " kimage_text: %016lx\n",
ms->kimage_text);
> > + fprintf(fp, " kimage_end: %016lx\n",
ms->kimage_end);
> > + fprintf(fp, " kimage_voffset: %016lx\n",
ms->kimage_voffset);
> > + }
> > fprintf(fp, " phys_offset: %lx\n", ms->phys_offset);
> > fprintf(fp, "__exception_text_start: %lx\n",
> > ms->__exception_text_start);
> > fprintf(fp, " __exception_text_end: %lx\n",
ms->__exception_text_end);
> > @@ -543,6 +614,42 @@ arm64_dump_machdep_table(ulong arg)
> > }
> > }
> >
> > +static int
> > +arm64_parse_machdep_arg_l(char *argstring, char *param, ulong *value)
> > +{
> > + int len;
> > + int megabytes = FALSE;
> > + char *p;
> > +
> > + len = strlen(param);
> > + if (!STRNEQ(argstring, param) || (argstring[len] != '='))
> > + return FALSE;
> > +
> > + if ((LASTCHAR(argstring) == 'm') ||
> > + (LASTCHAR(argstring) == 'M')) {
> > + LASTCHAR(argstring) = NULLCHAR;
> > + megabytes = TRUE;
> > + }
> > +
> > + p = argstring + len + 1;
> > + if (strlen(p)) {
> > + int flags = RETURN_ON_ERROR | QUIET;
> > + int err = 0;
> > +
> > + if (megabytes) {
> > + *value = dtol(p, flags, &err);
> > + if (!err)
> > + *value = MEGABYTES(*value);
> > + } else {
> > + *value = htol(p, flags, &err);
> > + }
> > +
> > + if (!err)
> > + return TRUE;
> > + }
> > +
> > + return FALSE;
> > +}
> >
> > /*
> > * Parse machine dependent command line arguments.
> > @@ -554,11 +661,10 @@ arm64_dump_machdep_table(ulong arg)
> > static void
> > arm64_parse_cmdline_args(void)
> > {
> > - int index, i, c, err;
> > + int index, i, c;
> > char *arglist[MAXARGS];
> > char buf[BUFSIZE];
> > char *p;
> > - ulong value = 0;
> >
> > for (index = 0; index < MAX_MACHDEP_ARGS; index++) {
> > if (!machdep->cmdline_args[index])
> > @@ -580,39 +686,23 @@ arm64_parse_cmdline_args(void)
> > c = parse_line(buf, arglist);
> >
> > for (i = 0; i < c; i++) {
> > - err = 0;
> > -
> > - if (STRNEQ(arglist[i], "phys_offset=")) {
> > - int megabytes = FALSE;
> > - int flags = RETURN_ON_ERROR | QUIET;
> > -
> > - if ((LASTCHAR(arglist[i]) == 'm') ||
> > - (LASTCHAR(arglist[i]) == 'M')) {
> > - LASTCHAR(arglist[i]) = NULLCHAR;
> > - megabytes = TRUE;
> > - }
> > -
> > - p = arglist[i] + strlen("phys_offset=");
> > - if (strlen(p)) {
> > - if (megabytes)
> > - value = dtol(p, flags, &err);
> > - else
> > - value = htol(p, flags, &err);
> > - }
> > -
> > - if (!err) {
> > - if (megabytes)
> > - value = MEGABYTES(value);
> > -
> > - machdep->machspec->phys_offset = value;
> > -
> > - error(NOTE,
> > - "setting phys_offset to: 0x%lx\n\n",
> > - machdep->machspec->phys_offset);
> > + if (arm64_parse_machdep_arg_l(arglist[i],
> > + "phys_offset",
> > + &machdep->machspec->phys_offset)) {
> > + error(NOTE,
> > + "setting phys_offset to: 0x%lx\n\n",
> > + machdep->machspec->phys_offset);
> > +
> > + machdep->flags |= PHYS_OFFSET;
> > + continue;
> > + } else if (arm64_parse_machdep_arg_l(arglist[i],
> > + "kimage_voffset",
> > + &machdep->machspec->kimage_voffset)) {
> > + error(NOTE,
> > + "setting kimage_voffset to: 0x%lx\n\n",
> > + machdep->machspec->kimage_voffset);
> >
> > - machdep->flags |= PHYS_OFFSET;
> > - continue;
> > - }
> > + continue;
> > }
> >
> > error(WARNING, "ignoring --machdep option: %s\n",
> > @@ -627,10 +717,20 @@ arm64_calc_phys_offset(void)
> > {
> > struct machine_specific *ms = machdep->machspec;
> > ulong phys_offset;
> > + struct syment *sp;
> > + ulong value;
> >
> > if (machdep->flags & PHYS_OFFSET) /* --machdep override */
> > return;
> >
> > + sp = kernel_symbol_search("memstart_addr");
> > + if (sp && readmem(sp->value, KVADDR, (char *)&value,
sizeof(value),
> > + "memstart_addr", QUIET|RETURN_ON_ERROR)) {
> > + ms->phys_offset = value;
> > + machdep->flags |= PHYS_OFFSET;
> > + return;
> > + }
> > +
>
> Also, I may be missing something, but the PHYS_OFFSET flag means that it
> was
> passed in via --machdep, and is only used in this function to avoid any
> further
> attempts to figure out what it is. Here you are setting the flag for no
> particular
> reason that I can see. Why?
Well, I don't remember very well. If not necessary, I will remove it.
OK, I guarantee it's useless in this case...
> But again, doing a readmem() call this early is setting a bad
precedent --
> they have
> *always* been delayed until after gdb_session_init() in main_loop(). And
> the work
> of determining phys_base should be done by the dumpfile call-out functions
> that
> already exist.
>
>
> > /*
> > * Next determine suitable value for phys_offset. User can override
> > this
> > * by passing valid '--machdep phys_offset=<addr>' option.
> > @@ -2377,6 +2477,11 @@ arm64_IS_VMALLOC_ADDR(ulong vaddr)
> > {
> > struct machine_specific *ms = machdep->machspec;
> >
> > + if ((machdep->flags & NEW_VMEMMAP) &&
> > + (vaddr >= machdep->machspec->kimage_text) &&
> > + (vaddr <= machdep->machspec->kimage_end))
> > + return FALSE;
> > +
> > return ((vaddr >= ms->vmalloc_start_addr && vaddr <=
> > ms->vmalloc_end) ||
> > ((machdep->flags & VMEMMAP) &&
> > (vaddr >= ms->vmemmap_vaddr && vaddr <=
> > ms->vmemmap_end))
> > ||
> > @@ -2407,7 +2512,11 @@ arm64_calc_VA_BITS(void)
> >
> > for (bitval = highest_bit_long(value); bitval; bitval--) {
> > if ((value & (1UL << bitval)) == 0) {
> > +#if 1
> > + machdep->machspec->VA_BITS = bitval + 1;
> > +#else /* FIXME: bug? */
> > machdep->machspec->VA_BITS = bitval + 2;
> > +#endif
> > break;
> > }
> > }
>
> Again, a non-starter until the above is resolved -- but you know that
> already...
Just a reminder for now.
>
> > @@ -2459,10 +2568,22 @@ arm64_calc_virtual_memory_ranges(void)
> > break;
> > }
> >
> > - vmemmap_size = ALIGN((1UL << (ms->VA_BITS - machdep->pageshift))
*
> > SIZE(page), PUD_SIZE);
> > + if (machdep->flags & NEW_VMEMMAP)
> > +#define STRUCT_PAGE_MAX_SHIFT 6
> > + vmemmap_size = 1UL << (ms->VA_BITS - machdep->pageshift - 1
> > + + STRUCT_PAGE_MAX_SHIFT);
> > + else
> > + vmemmap_size = ALIGN((1UL << (ms->VA_BITS - machdep->pageshift))
*
> > SIZE(page), PUD_SIZE);
> > +
> > vmalloc_end = (ms->page_offset - PUD_SIZE - vmemmap_size - SZ_64K);
> > - vmemmap_start = vmalloc_end + SZ_64K;
> > - vmemmap_end = vmemmap_start + vmemmap_size;
> > +
> > + if (machdep->flags & NEW_VMEMMAP) {
> > + vmemmap_start = ms->page_offset - vmemmap_size;
> > + vmemmap_end = ms->page_offset;
> > + } else {
> > + vmemmap_start = vmalloc_end + SZ_64K;
> > + vmemmap_end = vmemmap_start + vmemmap_size;
> > + }
> >
> > ms->vmalloc_end = vmalloc_end - 1;
> > ms->vmemmap_vaddr = vmemmap_start;
> > diff --git a/defs.h b/defs.h
> > index a09fa9a..2bbb55b 100644
> > --- a/defs.h
> > +++ b/defs.h
> > @@ -2844,8 +2844,8 @@ typedef u64 pte_t;
> >
> > #define PTOV(X) \
> > ((unsigned
> >
long)(X)-(machdep->machspec->phys_offset)+(machdep->machspec->page_offset))
> > -#define VTOP(X) \
> > - ((unsigned
> >
long)(X)-(machdep->machspec->page_offset)+(machdep->machspec->phys_offset))
> > +
> > +#define VTOP(X) arm64_VTOP((ulong)(X))
> >
> > #define USERSPACE_TOP (machdep->machspec->userspace_top)
> > #define PAGE_OFFSET (machdep->machspec->page_offset)
> > @@ -2938,18 +2938,23 @@ typedef signed int s32;
> > #define VM_L3_4K (0x10)
> > #define KDUMP_ENABLED (0x20)
> > #define IRQ_STACKS (0x40)
> > +#define NEW_VMEMMAP (0x80)
> >
> > /*
> > * sources: Documentation/arm64/memory.txt
> > * arch/arm64/include/asm/memory.h
> > * arch/arm64/include/asm/pgtable.h
> > */
> > -
> > -#define ARM64_PAGE_OFFSET ((0xffffffffffffffffUL) <<
> > (machdep->machspec->VA_BITS - 1))
> > +#define ARM64_VA_START ((0xffffffffffffffffUL) \
> > + << machdep->machspec->VA_BITS)
> > +#define ARM64_PAGE_OFFSET ((0xffffffffffffffffUL) \
> > + << (machdep->machspec->VA_BITS - 1))
> > #define ARM64_USERSPACE_TOP ((1UL) <<
machdep->machspec->VA_BITS)
> > -#define ARM64_MODULES_VADDR (ARM64_PAGE_OFFSET - MEGABYTES(64))
> > -#define ARM64_MODULES_END (ARM64_PAGE_OFFSET - 1)
> > -#define ARM64_VMALLOC_START ((0xffffffffffffffffUL) <<
> > machdep->machspec->VA_BITS)
> > +
> > +/* only used for v4.6 or later */
> > +#define ARM64_MODULES_VSIZE MEGABYTES(128)
> > +#define ARM64_KASAN_SHADOW_SIZE (1UL <<
(machdep->machspec->VA_BITS -
> > 3))
> > +
> > /*
> > * The following 3 definitions are the original values, but are obsolete
> > * for 3.17 and later kernels because they are now build-time
> > calculations.
> > @@ -3028,6 +3033,10 @@ struct machine_specific {
> > ulong kernel_flags;
> > ulong irq_stack_size;
> > ulong *irq_stacks;
> > + /* only needed for kaslr-enabled kernel */
> > + ulong kimage_voffset;
> > + ulong kimage_text;
> > + ulong kimage_end;
> > };
>
> The comment is wrong -- the kimage_text and kimage_end values are needed
> for
> NEW_VMEMMAP kernels that are NOT configured with CONFIG_RANDOMIZE_BASE.
> Unless you consider my live system "kaslr-enabled"?
I will change it to "for v4.6 or later".
I have limited (or no) time to investigate a live-system case, and so if
you have any suggestions or your own patch, it is very much appreciated.
Yep... but I will build upon your solid foundation.
If I see anything interesting/different after my live system upgrade, I'll let
you know.
Thanks,
Dave