On Fri, May 05, 2006 at 10:03:54AM -0400, Dave Anderson wrote:
Maneesh Soni wrote:
> Hi Dave,
>
> Following is a list of a few proposed improvements to crash utility though
> for most of the items there are no names associated.
>
> Please let us know if these look useful or not. And if found appropriate
> would it be possible for you to merge these with the crash todo list.
>
> Thanks to Badari Pulavarty, Richard Moore and Vara Prasad for the inputs.
>
> Regards
> Maneesh
>
> --------------------------------------------------------------------------------
> DESCRIPTION:
> clean & correct stack back traces on platforms ALL the time.
> - x86_64 (currently wrong and need fixing)
> - frame pointers off ? (on x86 we still don't have frame pointers on)
>
> RESOLUTION STATUS: Work-in-progress by Rachita Kothiyal <rachita(a)in.ibm.com>
Certainly a welcome task. I suggest segregating the code in a separate
file (as done with lkcd_x86_trace.c), and the new entry point can simply
be plugged into machdep->back_trace function pointer at init time.
There should also be an "out" to allow it to be set back to use the
current x86_64_low_budget_back_trace_cmd(). Also, if it doesn't
support -fomit-frame-pointer, it's not worth doing.
ok, thanks for the suggestion. I have added this in the modified list
appended below and request Rachita to keep this in mind.
>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
> Code restructuring:
> - move as much code for advanced commands to libraries so that
> crash is at least able to open the dump image and perform minimal
> set of commands like bt, dump dmesg log, disassemble etc. irrespective
> of kernel version.
> - code is hard to read & understand - need to re-write some of the
> basic subsystems like memory mapping, pagetable management etc
>
> RESOLUTION STATUS:
> Work-in-progress by Dave Wilder <dwilder(a)us.ibm.com> and
> Maneesh Soni <maneesh(a)in.ibm.com>
>
I don't quite understand how moving code to libraries is going to
achieve the goal here. Things in some of the various *_init() functions
could certainly be streamlined (or skipped) in order to make it more
likely to make it to the first prompt. For example, the task table initialization
could be made to simply fill in the context data for just the panic task.
(But it almost sounds like you just want to use gdb alone for the minimal
set of commands you've listed?)
The main aim is to have crash atleast make it to the first prompt. And for
advanced commands either we can try postponing *_init() function till the
first invocation or keep them in libraries.
As far as "re-writes" are concerned, please keep in mind
the
necessity of backwards-compatibility. I'd much rather keep the current
code -- that's known to work -- in place, and if you come up with
something new, or re-shuffled, make it only callable when the kernel
is of a known kernel version or later.
The point is, let's not just re-invent the wheel just for purpose of
re-inventing the wheel.
Agreed, backward compatibility should be maintained.
>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
> Crash & kernel version independence:
> kernel headers & code - reuse ? It would be nice to figure
> out a way to include kernel headers and sections of kernel code
> to do hard stuff (like memory mapping functions page_to_pfn,
> pfn_to_page, pagetable decoding etc..).
>
> RESOLUTION STATUS:
> Work-in-progress by Dave Wilder <dwilder(a)us.ibm.com> and
> Maneesh Soni <maneesh(a)in.ibm.com>
>
I don't particularly like this suggestion. (I thought we just went through
a problem where Ubuntu kernels don't even have kernel headers?)
As far as code reuse, we already do that in a number of places, so
I guess that's OK.
And there is just never seems to be a "one-size-fits-all" set of
kernel functions/macros that covers all bases over the life of
the kernel and each processor type.
But as always, I'm open to suggestion.
Actually final form of the solution is still not decided. It could be kernel
headers, or some binary (library). There has been some patches regarding
"make headers_install" which need some investigation also to see if they
can be of some help here.
<snip>
> DESCRIPTION:
> per-cpu info (like stacks traces)
>
> RESOLUTION STATUS: TBD
Needs more of a description...
Badari, I guess you meant some command to dump all per-cpu data for each cpu?
or some specific data. Please correct me here.
>
> --------------------------------------------------------------------------------
> DESCRIPTION:
> User space enhancements
> - show user space stack backtrace, if present in the dump file,
> - ability to link user space namelist (debug object files),
>
> RESOLUTION STATUS: TBD
>
I thought crash was a kernel [crash/live-system] analyzer?
You currently can add user-space debug data with "add-symbol-file",
which loads the debug data and symbols into gdb. I have done this
kind of thing, but it's been an "almost-never" kind of situation, where
I've wanted to display a user program's data structure.
But if you want to start throwing in this kind of user-space stuff,
please just keep it segregated.
ok, I hope keeping all such commands in separate library should not be
objectionable.
>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
> Platform specific enhancements
> - Establish CPU registers at the time of exceptions in the current context
> - Ability to handle CPU registers from current context using symbols in
> expressions
> - Ability to format basic processor structures like LDT, GDT, task gates
> for x86 arch
>
Not clear on what "establishing" CPU registers means. We already
dump exception frames.
I guess you mean to be able to use a register connotation in certain
commands, as opposed to the address contained in the register?
Right. We can have
commands to dump specific register contents or use registers as arguments to some
commands.
That's potentially messy, because it puts processor-specific
stuff
in processor-neutral code.
Probably we can use extended libraries for such command to reduce the clutter.
As far as the LDT, GDB, task gates formatting, that's fine.
>
> RESOLUTION STATUS: TBD
>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
> cross architecture support for crash
>
> RESOLUTION STATUS: TBD
No way -- we've been through this before. It is essentially a complete re-write.
If you want this, make a new command entirely.
Ok, but this looks like high on priority for some and low for some. In anycase
this should be done in acceptable way.
>
>
> --------------------------------------------------------------------------------
I've made my personal feelings on these kinds of things before,
which is to take a "minimalist" approach. Every new bell and whistle
is virtually guaranteed to break as the kernel churns. And they all
require an additional support burden. If I had my druthers, crash
would have less rather than more at this point.
But I understand that this has become a community project, and
with the few exceptions above, I'm open to all patch suggestions.
Thanks. I have appended the modified list below, keeping you suggestions in
mind.
Maneesh
--------------------------------------------------------------------------------
DESCRIPTION:
clean & correct stack back traces on platforms ALL the time.
- x86_64 (currently wrong and need fixing)
- segregate the code in a separate file (as done with lkcd_x86_trace.c),
and the new entry point can simply be plugged into machdep->back_trace
function pointer at init time.
- There should also be an "out" to allow it to be set back to use the
current x86_64_low_budget_back_trace_cmd(). Also, if it doesn't
support -fomit-frame-pointer, it's not worth doing.
RESOLUTION STATUS: Work-in-progress by Rachita Kothiyal <rachita(a)in.ibm.com>
--------------------------------------------------------------------------------
DESCRIPTION:
Code restructuring:
- streamline the *_init() functions so as that crash is at least
able to open the dump image and perform minimal set of commands
like bt, dump dmesg log, disassemble etc. irrespective of kernel
version.
- code is hard to read & understand - need to re-write some of the
basic subsystems like memory mapping, pagetable management etc
maintaining the backward compatibility.
RESOLUTION STATUS:
Work-in-progress by Dave Wilder <dwilder(a)us.ibm.com> and
Maneesh Soni <maneesh(a)in.ibm.com>
--------------------------------------------------------------------------------
DESCRIPTION:
Crash & kernel version independence:
- kernel headers & code - reuse ? It would be nice to figure
out a way to include kernel headers and sections of kernel code
to do hard stuff (like memory mapping functions page_to_pfn,
pfn_to_page, pagetable decoding etc..).
RESOLUTION STATUS:
Work-in-progress by Dave Wilder <dwilder(a)us.ibm.com> and
Maneesh Soni <maneesh(a)in.ibm.com>
--------------------------------------------------------------------------------
DESCRIPTION:
Mini report:
- The goal of this is to produce a summary report of common information
that is used to track problems. The idea here is for many problems we
probably don't need to get the whole dump shipped and as you probably
figured out by now it is not easy to ship and store these huge dump
files.
RESOLUTION STATUS: TBD
--------------------------------------------------------------------------------
DESCRIPTION:
Automatic verification of the dump:
- When you get a dump to look at problem there are few common tasks
one performs, the idea here is to automate those tasks and provide
a simple interface in the tool. Another possibility is automatic
verification of important datastructures, for example if the task
list says there are 30 tasks this feature automatically walks the
list and counts to verify if there are 30 in the list or not, if 30
entries or not found this may give a clue of some kind of a
corruption.
RESOLUTION STATUS: TBD
--------------------------------------------------------------------------------
DESCRIPTION:
function arguments:
- Display arguments in the stack trace. At present, we do not have
support for PPC64 and x86_64. On PPC64, user can dump retrieve only
for top level frame from pt_regs. However, user can dump complete
stack frame and read arguments. So, it is manual process and need
to have some expertise on the stack frame
RESOLUTION STATUS: TBD
--------------------------------------------------------------------------------
DESCRIPTION:
local variables:
- Facilitate possible display of local variables with stack frames
Since we are using debug vmlinux, we can find local variables
locations from Dwarf2.
RESOLUTION STATUS: TBD
--------------------------------------------------------------------------------
DESCRIPTION:
better assembly & source languge, line# display in disassembly
- interacting with gdb might help as for any text address, gdb has
the associated line number data but there might be some confusion
depending up the source of text.
RESOLUTION STATUS: TBD
--------------------------------------------------------------------------------
DESCRIPTION:
per-cpu info (like stacks traces)
- Display all or specific per cpu data for all cpus or specific cpu.
RESOLUTION STATUS: TBD
--------------------------------------------------------------------------------
DESCRIPTION:
User space enhancements
- show user space stack backtrace, if present in the dump file,
- ability to link user space namelist (debug object files),
RESOLUTION STATUS: TBD
--------------------------------------------------------------------------------
DESCRIPTION:
Platform specific enhancements
- Establish CPU registers at the time of exceptions in the current
context
- Ability to handle CPU registers from current context using symbols
in expressions
- Ability to format basic processor structures like LDT, GDT, task
gates for x86 arch
RESOLUTION STATUS: TBD
--------------------------------------------------------------------------------
DESCRIPTION:
cross architecture support for crash
RESOLUTION STATUS: TBD
--------------------------------------------------------------------------------
DESCRIPTION:
scripting support
- integrating scripting support with perl or python like language,
"Alicia" can be one example or the solution itself.
RESOLUTION STATUS: TBD