Re: [Crash-utility] Running idle threads show wrong CPU numbers
by Dave Anderson
----- "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com> wrote:
> Hi Dave,
>
> I have a problem with a dump where I have defined five CPUs and two of
> them are offline. In fact the logical CPUs are defined as follows:
>
> 0 on
> 1 on
> 2 off
> 3 off
> 4 on
>
> The CPU online map looks correct:
>
> crash> print/x *cpu_online_mask
> $4 = {
> bits = {0x13} ---> b10011
> }
>
> When I issue "ps" I see that all running tasks are idle, but the CPU
> numbers are not correct (0,1,2 and not 0,1,4):
>
> PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > 0 0 0 800ef0 RU 0.0 0 0 [swapper]
> > 0 0 1 18c24240 RU 0.0 0 0 [swapper]
> > 0 0 2 18c2c340 RU 0.0 0 0 [swapper]
>
> I tried to debug the problem, but got stuck somewhere in "task.c". I
> think there is a problem with the idle threads initialization, where the
> online map is not considered.
>
> Maybe you can see the bug immediately. Otherwise I will have spend more
> effort for debugging that problem. I hope not :-)
Does "sys" show 5 or 3 cpus? I'm guessing it shows 3, but should show 5.
It looks like the s390/s390x files need to use "get_highest_cpu_online()-1"
(like x86_64 and ppc64) in order to determine the number of cpus to account
for. As it is now, they do this, and would therefore only account for the
first 3 cpus:
int
s390x_get_smp_cpus(void)
{
return get_cpus_online();
}
int
s390_get_smp_cpus(void)
{
return get_cpus_online();
}
Dave
14 years, 10 months
Re: [Crash-utility] Using crash - is a debug kernel required during vmcore collection
by Dave Anderson
----- "Gallus" <gall.cwpl(a)gmail.com> wrote:
> I have a simple question: In order to use crash, the vmcore doesn't
> have to be collected under "debug" kernel? The symbols can be provided
> later, during the analysis with the crash tool, right?
I not sure I understand your question. Are you asking that
if the vmcore of a particular kernel was collected, and you
you do not have the debuginfo vmlinux that is associated
with it, can you still analyze the vmcore?
If that's what you're asking, then yes, there are ways to do that.
You can rebuild the same kernel, and use the newly-built debuginfo
vmlinux file along with the System.map file of the original kernel,
like this:
$ crash vmlinux-built-after-the-fact System-map vmcore
You will have a few restrictions -- such as not being able
to get line-number information from commands that display it.
This is the *only* reason a System.map file is *ever* needed,
and that's because the symbol values of the rebuilt debuginfo
vmlinux file typically do not match those of the original
crashing kernel.
If you're asking whether the secondary kdump kernel needs to
be the same as the crashing kernel, then the answer is no.
Dave
14 years, 11 months
Re: [Crash-utility] crash fails to build with gcc-4.5
by Dave Anderson
----- "Troy Heber" <troy.heber(a)hp.com> wrote:
> When trying to build crash with gcc-4.5 on x86-64 you get:
>
> unwind_x86_32_64.c:50:2: error: initializer element is not constant
> unwind_x86_32_64.c:50:2: error: (near initialization for 'reg_info[7].offs')
> unwind_x86_32_64.c:50:2: error: initializer element is not constant
> unwind_x86_32_64.c:50:2: error: (near initialization for 'reg_info[8].offs')
> unwind_x86_32_64.c:50:2: error: initializer element is not constant
> ...
>
> When you start to dig into this you quickly end up playing with lots
> of really fun macros from unwind_x86_64.h. Eventually, you end up
> playing with this one:
>
> #define BUILD_BUG_ON_ZERO(e) (sizeof(char[1 - 2 * !!(e)]) - 1)
>
> If you pull this macro out and play with it by itself it seems to
> work fine with both gcc-4.5 and gcc < 4.5. It is only when it is used in
> combinations with the other macro expression that gcc-4.5 fails to
> evaluate it and I have no clue why.
>
> When looking at the BUILD_BUG_ON_ZERO macro upstream in
> include/linux/kernel.h we can see it has been replaced with this
> version:
>
> #define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
>
> It turns out that gcc-4.5 is perfectly happy with the updated version!
>
>
> This was done in commit: 8c87df457cb58fe75b9b893007917cf8095660a0
>
> BUILD_BUG_ON(): fix it and a couple of bogus uses of it
>
> gcc permitting variable length arrays makes the current construct used for
> BUILD_BUG_ON() useless, as that doesn't produce any diagnostic if the
> controlling expression isn't really constant. Instead, this patch makes
> it so that a bit field gets used here. Consequently, those uses where the
> condition isn't really constant now also need fixing.
>
> Note that in the gfp.h, kmemcheck.h, and virtio_config.h cases
> MAYBE_BUILD_BUG_ON() really just serves documentation purposes - even if
> the expression is compile time constant (__builtin_constant_p() yields
> true), the array is still deemed of variable length by gcc, and hence the
> whole expression doesn't have the intended effect.
>
> It looks like this could end up being a potential bug in gcc. I'll
> file a bug with gcc and try to provide them with a simplified test
> case. However, since this macro changed upstream and acts as a
> workaround for the issue I would propose making the update in crash
> as well.
>
> Troy
I've been tempted to just rip out unwind_x86_32_64.c, unwind_x86_64.h
and unwind_x86.h since they're pretty much useless. The unwind code
in those files is only used if explicitly requested by "set unwind on"
*and* if the kernel supports it (which it hasn't since Jan Beulich's
x86/x86_64 temporary DWARF/unwind stuff was pulled).
But thanks for digging this out -- queued for the next release.
Dave
>
> ---
> diff --git a/unwind_x86_64.h b/unwind_x86_64.h
> index a79c2d5..52fcf7a 100644
> --- a/unwind_x86_64.h
> +++ b/unwind_x86_64.h
> @@ -61,7 +61,7 @@ extern void free_unwind_table(void);
> #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
> #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> #define BUILD_BUG_ON(condition) ((void)sizeof(char[1 -
> 2*!!(condition)]))
> -#define BUILD_BUG_ON_ZERO(e) (sizeof(char[1 - 2 * !!(e)]) - 1)
> +#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
> #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
> #define get_unaligned(ptr) (*(ptr))
> //#define __get_user(x,ptr)
> __get_user_nocheck((x),(ptr),sizeof(*(ptr)))
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
14 years, 11 months
crash fails to build with gcc-4.5
by Troy Heber
When trying to build crash with gcc-4.5 on x86-64 you get:
unwind_x86_32_64.c:50:2: error: initializer element is not constant
unwind_x86_32_64.c:50:2: error: (near initialization for 'reg_info[7].offs')
unwind_x86_32_64.c:50:2: error: initializer element is not constant
unwind_x86_32_64.c:50:2: error: (near initialization for 'reg_info[8].offs')
unwind_x86_32_64.c:50:2: error: initializer element is not constant
...
When you start to dig into this you quickly end up playing with lots
of really fun macros from unwind_x86_64.h. Eventually, you end up
playing with this one:
#define BUILD_BUG_ON_ZERO(e) (sizeof(char[1 - 2 * !!(e)]) - 1)
If you pull this macro out and play with it by itself it seems to
work fine with both gcc-4.5 and gcc < 4.5. It is only when it is used in
combinations with the other macro expression that gcc-4.5 fails to
evaluate it and I have no clue why.
When looking at the BUILD_BUG_ON_ZERO macro upstream in
include/linux/kernel.h we can see it has been replaced with this
version:
#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
It turns out that gcc-4.5 is perfectly happy with the updated version!
This was done in commit: 8c87df457cb58fe75b9b893007917cf8095660a0
BUILD_BUG_ON(): fix it and a couple of bogus uses of it
gcc permitting variable length arrays makes the current construct used for
BUILD_BUG_ON() useless, as that doesn't produce any diagnostic if the
controlling expression isn't really constant. Instead, this patch makes
it so that a bit field gets used here. Consequently, those uses where the
condition isn't really constant now also need fixing.
Note that in the gfp.h, kmemcheck.h, and virtio_config.h cases
MAYBE_BUILD_BUG_ON() really just serves documentation purposes - even if
the expression is compile time constant (__builtin_constant_p() yields
true), the array is still deemed of variable length by gcc, and hence the
whole expression doesn't have the intended effect.
It looks like this could end up being a potential bug in gcc. I'll
file a bug with gcc and try to provide them with a simplified test
case. However, since this macro changed upstream and acts as a
workaround for the issue I would propose making the update in crash as
well.
Troy
---
diff --git a/unwind_x86_64.h b/unwind_x86_64.h
index a79c2d5..52fcf7a 100644
--- a/unwind_x86_64.h
+++ b/unwind_x86_64.h
@@ -61,7 +61,7 @@ extern void free_unwind_table(void);
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
-#define BUILD_BUG_ON_ZERO(e) (sizeof(char[1 - 2 * !!(e)]) - 1)
+#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
#define get_unaligned(ptr) (*(ptr))
//#define __get_user(x,ptr) __get_user_nocheck((x),(ptr),sizeof(*(ptr)))
14 years, 11 months
An example of crashdc output
by Louis Bouchard
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello everyone,
For those interested, here is an example of crashdc output (BASIC mode) :
http://cariblog.kamikamamak.com/crashdc-example-in-basic-mode/
Latest developments bits are available here :
http://sourceforge.net/projects/crashdc/files/
FYI, I'm still planning/hoping to have crashdc as part of the crash
utility RPM, but sf.net makes it easier for me to make it publicly
available for early testing.
It is also intended to be made easily available for our internal support
staff.
Kind Regards,
...Louis
- --
Louis Bouchard, Linux Support Engineer
Team lead, EMEA Linux Competency Center,
Linux Ambassador, HP
HP Services 1 Ave du Canada
HP France Z.A. de Courtaboeuf
louis.bouchard(a)hp.com 91 947 Les Ulis
http://www.hp.com/go/linux France
http://www.hp.com/fr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAktdjHoACgkQDvqokHrhnCyz8wCg8s877PboJkF0ia4XewSR/v0/
GqkAoPcg7qM7wRI7zrWBWdvDASX4lx4c
=r2zq
-----END PGP SIGNATURE-----
14 years, 11 months
Running idle threads show wrong CPU numbers
by Michael Holzheu
Hi Dave,
I have a problem with a dump where I have defined five CPUs and two of
them are offline. In fact the logical CPUs are defined as follows:
0 on
1 on
2 off
3 off
4 on
The CPU online map looks correct:
crash> print/x *cpu_online_mask
$4 = {
bits = {0x13} ---> b10011
}
When I issue "ps" I see that all running tasks are idle, but the CPU
numbers are not correct (0,1,2 and not 0,1,4):
PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 0 0 0 800ef0 RU 0.0 0 0 [swapper]
> 0 0 1 18c24240 RU 0.0 0 0 [swapper]
> 0 0 2 18c2c340 RU 0.0 0 0 [swapper]
I tried to debug the problem, but got stuck somewhere in "task.c". I
think there is a problem with the idle threads initialization, where the
online map is not considered.
Maybe you can see the bug immediately. Otherwise I will have spend more
effort for debugging that problem. I hope not :-)
Michael
14 years, 11 months
Re: [Crash-utility] crash-5.0: Segmentation fault with x86_64_get_active_set
by Dave Anderson
----- "ville mattila" <ville.mattila(a)stonesoft.com> wrote:
> >
> > > Hello,
> > >
> > > I get segementation fault from our 64-bit kernel crash
> > > This crash is caused by "echo c > /proc/sys-trigger".
> > > The reason seems to be that the x86_64_cpu_pda_init is
> > > not called at least gdb do not break there.
... [ snip ] ...
A patch for your initialization-time segmentation violation is attached.
... [ snip ] ...
But as for this one:
>
> Btw, the "struct" command caused another segementation fault.
> Here is gdb bt:
>
> (gdb) bt
> #0 0x00007f74b3524a92 in strcmp () from /lib/libc.so.6
> #1 0x0000000000534284 in lookup_partial_symtab (name=0x120e3c0 "x8664_pda") at symtab.c:276
> #2 0x00000000005344ed in lookup_symtab (name=0x120e3c0 "x8664_pda") at symtab.c:228
> #3 0x000000000060019d in c_lex () at c-exp.y:2149
> #4 0x00000000006008f5 in c_parse_internal () at c-exp.c.tmp:1468
> #5 0x00000000006022dd in c_parse () at c-exp.y:2225
> #6 0x000000000055f614 in parse_exp_in_context (stringptr=0x7fffbc2f2260, block=<value optimized out>, comma=<value optimized out>, void_context_p=0, out_subexp=0x0) at parse.c:1094
> #7 0x000000000055f924 in parse_expression (string=0x7fffbc2f2950 "x8664_pda") at parse.c:1144
> #8 0x000000000053291b in gdb_command_funnel (req=0xca2c00) at symtab.c:4992
> #9 0x00000000004c1740 in gdb_interface (req=0xca2c00) at gdb_interface.c:407
> #10 0x00000000004e9dca in datatype_info (name=0xb618a7 "x8664_pda", member=0x0, dm=0x7fffbc2f3620) at symbols.c:4146
> #11 0x00000000004eb1ee in arg_to_datatype (s=0xb618a7 "x8664_pda", dm=0x7fffbc2f3620, flags=524290) at symbols.c:4867
> #12 0x00000000004efa1b in cmd_datatype_common (flags=2048) at symbols.c:4664
> #13 0x000000000045efd9 in exec_command () at main.c:644
> #14 0x000000000045f1fa in main_loop () at main.c:603
> #15 0x00000000005452a9 in captured_command_loop (data=0x120e3c0) at ./main.c:226
> #16 0x00000000005434e4 in catch_errors (func=0x5452a0 <captured_command_loop>, func_args=0x0, errstring=0x7f9d7c "", mask=<value optimized out>) at exceptions.c:520
> #17 0x0000000000544d36 in captured_main (data=<value optimized out>) at ./main.c:924
> #18 0x00000000005434e4 in catch_errors (func=0x544340 <captured_main>, func_args=0x7fffbc2f38b0, errstring=0x7f9d7c "", mask=<value optimized out>) at exceptions.c:520
> #19 0x000000000054412f in gdb_main_entry (argc=<value optimized out>, argv=<value optimized out>) at ./main.c:939
> #20 0x000000000045fece in main (argc=3, argv=0x7fffbc2f3a08) at main.c:517
> (gdb) frame 1
> #1 0x0000000000534284 in lookup_partial_symtab (name=0x120e3c0 "x8664_pda") at symtab.c:276
> 276 if (FILENAME_CMP (name, pst->filename) == 0)
> (gdb) p name
> $4 = 0x120e3c0 "x8664_pda"
> (gdb) p pst
> $5 = (struct partial_symtab *) 0x14d6040
> (gdb) p pst->filename
> $6 = 0x0
> (gdb) p *pst
> $7 = {next = 0x0, filename = 0x0, fullname = 0x0, dirname = 0x0,
> objfile = 0x0, section_offsets = 0x0, textlow = 0, texthigh = 0,
> dependencies = 0x0, number_of_dependencies = 0, globals_offset = 0,
> n_global_syms = 0, statics_offset = 0, n_static_syms = 0, symtab = 0x0,
> read_symtab = 0, read_symtab_private = 0x0, readin = 0 '\0'}
> (gdb)
I cannot reproduce it, even with your supplied kernel/dumpfile pair:
# crash vmlinux-2.6.31.7+up-syms kerneldump-20100114-091104
... [ snip ] ...
crash> struct x8664_pda
struct: invalid data structure reference: x8664_pda
crash>
While walking through the "ALL_PSYMTABS" list of partial_symtabs in
lookup_partial_symtab(), I never see a NULL-filled partial_symtab
structure as you show above. What I do see is 13434 partial_symtab
structures, with pst->filenames starting from:
/workarea5/work/fw/mulperi/ararat/kernel/sg-kernel.git/arch/x86/lib/csum-copy_64.S
until the last one:
/workarea5/work/fw/mulperi/ararat/kernel/sg-kernel.git/arch/x86/kernel/head_64.S
So I don't know what the deal is with that one.
Dave
14 years, 11 months
crashdc : an update
by Louis Bouchard
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello everyone and happy new year.
I hope that you will pardon me to highjack the mailing list. Here is a
quick update on the status of crashdc.
I have now received 'clearance' from my employer to release crashdc to
public, so it is now visible here :
http://crashdc.svn.sourceforge.net/viewvc/crashdc/
I'm in the process of testing the basic functionalities on all types of
kernels delivered by each distro ( xen,pae,bigsmp,xenpae, etc) for i386.
I will run the same tests again for x86_64 after that.
Since testing on IA64 will require setting up a completely different
test environment, I will delay IA64 up until crashdc has reached 'beta'.
For a little more details including architectural diagrams, I posted an
update on my blog here :
http://cariblog.kamikamamak.com/2010/01/14/crashdc-an-update/
Kind Regards,
...Louis
P.S. Please let me kwow if you prefer that I refrain from posting things
about crashdc here.
- --
Louis Bouchard, Linux Support Engineer
Team lead, EMEA Linux Competency Center,
Linux Ambassador, HP
HP Services 1 Ave du Canada
HP France Z.A. de Courtaboeuf
louis.bouchard(a)hp.com 91 947 Les Ulis
http://www.hp.com/go/linux France
http://www.hp.com/fr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAktO13sACgkQDvqokHrhnCy30ACgkK5D8SyJq0Ce7kQFMytYu04U
FKkAoO/atx/u9iwaCuRJn04OD9dObMOQ
=Pz8Z
-----END PGP SIGNATURE-----
14 years, 11 months