Re: [Crash-utility] [PATCH] Add the proccgroup extension

Monday, 11 April 2016

On 04/11/2016 05:06 PM, Dave Anderson wrote:
...

 ----- Original Message -----
> Initial version of a crash module which can be used to show which cgroups
> is the currently active process member of.
> ---
> Hello this is a simple crash extension that I hacked up over the weekend, in my
> case when I look at kernel crash dump I want to quickly understand which cgroup
> is the current process member of. Currently it uses the process from the current
> context but this might change in the future. Here is an example output:
>
> crash> show_cgroups
> subsys: cpuset               cgroup: c6666
> subsys: cpu                  cgroup: c6666
> subsys: cpuacct              cgroup: c6666
> subsys: io                   cgroup: c6666
> subsys: memory               cgroup: c6666
> subsys: devices              cgroup: c6666
> subsys: freezer              cgroup: c6666
> subsys: perf_event           cgroup: c6666
> subsys: pids                 cgroup: c6666
>
> I have tested this on 4.6-rc2 with and without cgroup support enabled.
>
> I'm just sending this to get an initial idea whether I have used crash's
> facilities correctly and canvas for future ideas. I'm aware there are already 2
> cgroup modules but when I tried running either they complained of no cgroup 
> support or the command did nothing. In any case provided that the code is ok I
> guess this can be used as a good example of how to traverse structures with crash
>
> TODO:
>  * Make the command understand either task_struct pointer or pid being passed.
>  * Add support for pre-3.15 kernels (the cgroup name struct changed to kernfs at that
point)
>  * Whatever people think might be useful

 Hello Nikolay,

 I have a few comments and suggestions:

 +#include "defs.h"
 +
 +#define CGROUP_PATH_MAX

 What is the purpose of CGROUP_PATH_MAX?   
I was thinking of using this define to limit the size of various
buffers, but I saw that for most cases BUFSIZE works. This will likely
be removed.

...

 +void proccgroup_init(void);
 +void proccgroup_fini(void);
 +
 ... [ cut ] ...
 +
 +void __attribute__((constructor))
 +echo_init(void)
 +{
 +        register_extension(command_table);
 +}
 +
 +void __attribute__((destructor))
 +echo_fini(void) { }
 +

 You've got forward declarations of proccgroup_init() and proccgroup_fini(), 
 but the two functions don't exist.  The "echo_xxx()" functions in the 
 echo.c sample module are meant to be replaced by names reflecting your
 module.  
I will rename the echo* function, I guess I forgot doing that. Actually
are forward declarations for the init/destroy functions really needed? I
guess not.

...

 If I run this module on a 3.10 kernel, I get this:

 crash> extend proccgroup.so
 ./extensions/proccgroup.so: shared object loaded
 crash> show_cgroups
 Unsupported kernel versionUnsupported kernel versionUnsupported kernel versionUnsupported
kernel versionUnsupported kernel versionUnsupported kernel versionUnsupported kernel
versionUnsupported kernel versionUnsupported kernel versionUnsupported kernel
versionshow_cgroups: invalid kernel virtual address: 0  type:
"cgroup_subsys_state->cgroup"
   Unsupported kernel version
 crash>

 because of this:

 +        /* Handle the 2 cases of cgroup_name and the kernfs one */
 +        if (MEMBER_EXISTS("cgroup", "kn")) {
 +            get_kn_cgroup_name(cgroup, subsys_ptr);
 +        } else if (MEMBER_EXISTS("cgroup", "name")) {
 +            fprintf(fp, "Unsupported kernel version");
 +        }

 You should replace the "fprintf(fp, "Unsupported kernel version")"
error message
 above with an "error(FATAL, ...)" construct (and add a "\n" to the
end), so that
 the command will fail immediately and return to the "crash> " prompt.

I've since added supported for pre-3.15 kernels. So this will likely
work, but I did see the output you showed. Clearly the failure case
wasn't handled properly.

...

 Alternatively, you can have your module purposely fail to load if any required
 structure members do not exist.  Typically modules will have their xxxx_init() 
 constructor function do some pre-checks of the kernel being analyzed, and if the
 command will not be able to run correctly, return before calling register_extension():

   void __attribute__((constructor))
   xxxx_init(void)
   {
    	if (!MEMBER_EXISTS(...) || !MEMBER_EXISTS(...)) {
   		fprintf(fp, "some error message\n");
   		return;
   	}

 	register_extension(command_table);
   }

 I note that all of your readmem() calls use RETURN_ON_ERROR, such as these examples:

 +static void get_cgroup_name(ulong cgroup, char *buf, size_t buflen) 
 +{
 +
 +    ulong kernfs_node;
 +    ulong cgroup_name_ptr;
 +    ulong kernfs_parent;
 +
 +    /* Get cgroup->kn */
 +    readmem(cgroup + MEMBER_OFFSET("cgroup", "kn"), KVADDR,
&kernfs_node, sizeof(void *),
 +            "cgroup->kn", RETURN_ON_ERROR);
 +
 +    readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "parent"),
KVADDR, &kernfs_parent, sizeof(void *),
 +            "kernfs_node->parent", RETURN_ON_ERROR);
 +
 +    if (kernfs_parent == 0) {
 +        sprintf(buf, "/");
 +        return;
 +    }
 +
 +    /* Get kn->name */
 +    readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "name"),
KVADDR, &cgroup_name_ptr, sizeof(void *),
 +            "kernfs_node->name", RETURN_ON_ERROR);
 +
 +    read_string(cgroup_name_ptr, buf, buflen-1);
 +}

 Unless you can do something useful in the case of a readmem() error, it's best to
 use FAULT_ON_ERROR.  You will get an error message automatically generated by readmem(),
 but similar to "error(FATAL, ...)" it will return immediately to the
"crash> " prompt.
 In your module above, you would just get a stream of confusing error messages. 
I thought return_on_error essentially works with fault_on_error
semantics. Clearly I've misunderstood that so will fix it.

...

 You might consider changing the name of the command to something that
 flows off the fingertips a bit easier:

 +static struct command_table_entry command_table[] = {
 +        { "show_cgroups", show_proc_cgroups, help_proc_cgroups, 0},
 +        { NULL },
 +};

 Maybe something like "cgrp", or "showcg", or anything short but to
 the point. 
Yes, I'm not very happy with the current name so will likely change that.

Thanks for taking the time to review it.

...

 Thanks,
   Dave 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] [PATCH] Add the proccgroup extension