On 13.04.2016 23:27, Dave Anderson wrote:
----- Original Message -----
> Initial version of a crash module which can be used to show which cgroups
> is a process member of.
>
> Signed-off-by: Nikolay Borisov <n.borisov.lkml(a)gmail.com>
> ---
>
> So here is the second version of the proccgroup module. Changes since v1:
>
> * Now show the full path to the cgroup (limited to 4k long paths).
> * Added support for passing either pid or hex address of task struct, so hat
> cgroup info can be acquired for an arbitrary task
> * Added support for pre-3.15 kernels
> * Removed leftovers from the echo module
Hello Nikolay,
While cgroups have existed since 2.6.24, it appears that cgroup.name
was introduced in 3.10, and cgroup.kn in 3.15. So I have only a
limited set of sample 3.10+ dumpfiles that I could test it on.
I have many 3.10-based RHEL7 kernels, and the same error occurs on
all of them:
crash> sys | grep RELEASE
RELEASE: 3.10.0-327.el7.x86_64
crash> showcg
showcg: invalid kernel virtual address: ff88046666e03060 type:
"cgroup_subsys->name"
crash>
The bad address looks to come from this line:
readmem(subsys + MEMBER_OFFSET("cgroup_subsys_state", "ss"),
KVADDR, &cgroup_subsys_ptr, sizeof(void *),
"cgroup_subsys_state->ss", FAULT_ON_ERROR);
because the 3.10 kernel does not have a cgroup_subsys_state.ss field, which was
added in 4.2:
It was actually added to 3.12 .
crash> cgroup_subsys_state
struct cgroup_subsys_state {
struct cgroup *cgroup;
atomic_t refcnt;
unsigned long flags;
struct css_id *id;
struct work_struct dput_work;
}
SIZE: 64
crash>
Unfortunately you don't have the benefit of being able to use OFFSET(), which
would fail immediately. MEMBER_OFFSET() returns -1 on invalid requests, so you
really have to verify the return value, or add it to your MEMBER_OFFSET() verifications
during your init function.
I guess on pre-3.12 kernels I will just skip printing the name of the
subsystem. I will take a brief look whether I could recreate the logic
in the module rather than relying on traversing structs but I don't
consider this high priority.
And there were these oddities on later kernel versions:
All 3 of my sample 3.13-based Fedora kernels result in this output:
crash> sys | grep RELEASE
RELEASE: 3.13.0-0.rc1.git2.1.fc20.x86_64
crash> showcg
subsys: cpuset cgroup: /
subsys: cpu cgroup: /
subsys: cpuacct cgroup: /
subsys: memory cgroup: /
subsys: devices cgroup: /
subsys: freezer cgroup: /
subsys: net_cls cgroup: /
subsys: blkio cgroup: /
subsys: perf_event cgroup: /
subsys: hugetlb cgroup: /
showcg: invalid kernel virtual address: 0 type:
"cgroup_subsys_state->cgroup"
crash>
I didn't look into why they all end that way. Maybe there's a NULL pointer in
the
last entry in the subsys array?
I will have to test this on a 3.13 kernel .
And lastly, I only have one 3.14-based kernel, which shows this:
crash> sys | grep RELEASE
RELEASE: 3.14.0-rc1+
crash> showcg
showcg: zero-size memory allocation! (called from 7f3280273719)
crash>
which would come a cgroup_subsys_arr value of 0 from here
en_subsys_cnt = MEMBER_SIZE("css_set", "subsys") / sizeof(void
*);
cgroup_subsys_arr = (ulong *)GETBUF(en_subsys_cnt * sizeof(ulong));
which depends upon CGROUP_SUBSYS_COUNT being something non-zero:
/*
* Set of subsystem states, one for each subsystem. This array is
* immutable after creation apart from the init_css_set during
* subsystem registration (at boot time).
*/
struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
And in that kernel apparently CONFIG_GROUPS was not configured and
therefore CGROUP_SUBSYS_COUNT is 0:
But there is already logic in the initialization routine which should
handle cases where CONFIG_CGROUP is not selected, simply by checking
whether the "cgroups" member in task_struct exists. I checked on LXR and
this member has always been protected by #ifdef CONFIG_CGROUPS. Maybe
this is fedora kernel specific? Can you please take a look in the
definition of task_struct whether the 'cgroups' member is protected by
an ifdef guard? I can easily augment the check to consider the size of
subsys array. I tested the code on 3.12 and on !CONFIG_CGROUPS the
extension correctly bails out.
#else /* CONFIG_CGROUPS */
#define CGROUP_SUBSYS_COUNT 0
static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk) {}
static inline void cgroup_threadgroup_change_end(struct task_struct *tsk) {}
#endif /* CONFIG_CGROUPS */
making it an empty structure:
crash> css_set
struct css_set {
atomic_t refcount;
struct hlist_node hlist;
struct list_head tasks;
struct list_head cgrp_links;
struct cgroup_subsys_state *subsys[];
struct callback_head callback_head;
}
SIZE: 72
crash> css_set -o
struct css_set {
[0] atomic_t refcount;
[8] struct hlist_node hlist;
[24] struct list_head tasks;
[40] struct list_head cgrp_links;
[56] struct cgroup_subsys_state *subsys[];
[56] struct callback_head callback_head;
}
SIZE: 72
crash>
The other 3.18 and 4.x based kernels ran the command OK.
Another thing I might suggest if your idea is to assist in the
actual debugging of cgroup problems -- would be to print the
address of key data structures as part of the command's output.
That kind of thing is done by most crash commands, so that a user
can quickly dump, for example, the target cgroup structure, or
perhaps some of the other structures that would be helpful to
fully display.
On the other hand, maybe all you're interested in seeing is the
cgroup name and path? I don't know -- that's up to you.
For now my intention is to have a quick way to know which cgroup is a
process member of. If someone can provide usecase as to which addresses
might be usefull I will consider adding those.
Also, you don't have to post your module as a patch to the
extensions subdirectory. I'm not going to add the file to the
crash sources contained in the tar.gz or src.rpm releases, but
rather I will post your module source file, and directions on
how to build it, on the extensions web page accessible from
http://people.redhat.com/anderson/extensions.html. So you can
just attach the module's C file to your email to this mailing list.
Ok, will have this in mind in my next posting.
Thanks a lot for the detailed and helpful feedback!
Thanks,
Dave
>
> extensions/proccgroup.c | 278
> ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 278 insertions(+)
> create mode 100644 extensions/proccgroup.c
>
> diff --git a/extensions/proccgroup.c b/extensions/proccgroup.c
> new file mode 100644
> index 0000000..aee735b
> --- /dev/null
> +++ b/extensions/proccgroup.c
> @@ -0,0 +1,278 @@
> +/*
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * Nikolay Borisov <n.borisov.lkml(a)gmail.com>
> + */
> +
> +#include <stdbool.h>
> +#include "defs.h"
> +
> +#define MAX_CGROUP_PATH 4096
> +
> +static void showcgrp(void);
> +char *help_proc_cgroups[];
> +
> +static struct command_table_entry command_table[] = {
> + { "showcg", showcgrp, help_proc_cgroups, 0},
> + { NULL },
> +};
> +
> +
> +void __attribute__((constructor))
> +proccgroup_init(void)
> +{
> +
> + if (!MEMBER_EXISTS("task_struct", "cgroups") ||
> + (!MEMBER_EXISTS("cgroup", "kn") &&
!MEMBER_EXISTS("cgroup",
> "name")))
> + {
> + fprintf(fp, "Unrecognised or disabled cgroup support\n");
> + return;
> + }
> +
> + register_extension(command_table);
> +}
> +
> +void __attribute__((destructor))
> +proccgroup_finish(void) { }
> +
> +/* Prepends contents of cgroup_name to buf, using start as a pointer
> + * index into buf
> + */
> +static void prepend_string(char *buf, char **start, char *cgroup_name) {
> +
> + int len = strlen(cgroup_name);
> + *start -= len;
> +
> + if (*start < buf) {
> + error(FATAL, "Cgroup too long to parse\n");
> + }
> +
> + memcpy(*start, cgroup_name, len);
> +
> + if (--*start < buf) {
> + error(FATAL, "Cgroup too long to parse\n");
> + }
> +
> + **start = '/';
> +}
> +
> +/* For post-3.15 kernels */
> +static void get_cgroup_name_kn(ulong cgroup, char *buf, int buflen)
> +{
> + ulong kernfs_node;
> + ulong cgroup_name_ptr;
> + ulong kernfs_parent;
> + bool slash_prepended = false;
> + char cgroup_name[BUFSIZE];
> + char *start = buf + buflen - 1;
> + *start = '\0'; //null terminate the end
> +
> + /* Get cgroup->kn */
> + readmem(cgroup + MEMBER_OFFSET("cgroup", "kn"), KVADDR,
&kernfs_node,
> sizeof(void *),
> + "cgroup->kn", FAULT_ON_ERROR);
> +
> + do {
> + /* Get kn->name */
> + readmem(kernfs_node + MEMBER_OFFSET("kernfs_node",
"name"), KVADDR,
> &cgroup_name_ptr, sizeof(void *),
> + "kernfs_node->name", FAULT_ON_ERROR);
> + /* Get kn->parent */
> + readmem(kernfs_node + MEMBER_OFFSET("kernfs_node",
"parent"),
> KVADDR, &kernfs_parent, sizeof(void *),
> + "kernfs_node->parent", FAULT_ON_ERROR);
> +
> + if (kernfs_parent != 0) {
> + read_string(cgroup_name_ptr, cgroup_name, BUFSIZE-1);
> + prepend_string(buf, &start, cgroup_name);
> + slash_prepended = true;
> + } else if (!slash_prepended) {
> + if (--start < buf) {
> + error(FATAL, "Cgroup too long to parse\n");
> + }
> + *start = '/';
> + }
> +
> + kernfs_node = kernfs_parent;
> +
> + } while(kernfs_parent);
> +
> + memmove(buf, start, buf + buflen - start);
> +}
> +
> +/* For pre-3.15 kernels */
> +static void get_cgroup_name_old(ulong cgroup, char *buf, size_t buflen)
> +{
> + ulong cgroup_name_ptr;
> + ulong cgroup_parent_ptr;
> + char cgroup_name[BUFSIZE];
> + char *start = buf + buflen - 1;
> + *start = '\0'; //null terminate the end
> + bool slash_prepended = false;
> +
> + do {
> + /* Get cgroup->name */
> + readmem(cgroup + MEMBER_OFFSET("cgroup", "name"),
KVADDR,
> &cgroup_name_ptr, sizeof(void *),
> + "cgroup->name", FAULT_ON_ERROR);
> + /* Get cgroup->parent */
> + readmem(cgroup + MEMBER_OFFSET("cgroup", "parent"),
KVADDR,
> &cgroup_parent_ptr, sizeof(void *),
> + "cgroup->parent", FAULT_ON_ERROR);
> +
> + read_string(cgroup_name_ptr + MEMBER_OFFSET("cgroup_name",
"name"),
> cgroup_name, BUFSIZE-1);
> +
> + if (cgroup_parent_ptr) {
> + prepend_string(buf, &start, cgroup_name);
> + slash_prepended = true;
> + } else if (!slash_prepended) {
> + if (--start < buf)
> + break;
> + *start = '/';
> + }
> +
> + cgroup = cgroup_parent_ptr;
> +
> + } while(cgroup_parent_ptr);
> +
> + memmove(buf, start, buf + buflen - start);
> +}
> +
> +static void get_subsys_name(ulong subsys, char *buf, size_t buflen)
> +{
> + ulong subsys_name_ptr;
> + ulong cgroup_subsys_ptr;
> +
> + /* Get cgroup->kn */
> + readmem(subsys + MEMBER_OFFSET("cgroup_subsys_state", "ss"),
KVADDR,
> &cgroup_subsys_ptr, sizeof(void *),
> + "cgroup_subsys_state->ss", FAULT_ON_ERROR);
> +
> + readmem(cgroup_subsys_ptr + MEMBER_OFFSET("cgroup_subsys",
"name"),
> KVADDR, &subsys_name_ptr, sizeof(void *),
> + "cgroup_subsys->name", FAULT_ON_ERROR);
> + read_string(subsys_name_ptr, buf, buflen-1);
> +}
> +
> +static void get_cgroup_name(ulong cgroup, ulong subsys)
> +{
> + char *cgroup_path = GETBUF(MAX_CGROUP_PATH);
> + char subsys_name[BUFSIZE];
> +
> + /* Handle the 2 cases of cgroup_name and the kernfs one */
> + if (MEMBER_EXISTS("cgroup", "kn")) {
> + get_cgroup_name_kn(cgroup, cgroup_path, MAX_CGROUP_PATH);
> + } else if (MEMBER_EXISTS("cgroup", "name")) {
> + get_cgroup_name_old(cgroup, cgroup_path, MAX_CGROUP_PATH);
> + }
> +
> + get_subsys_name(subsys, subsys_name, BUFSIZE);
> +
> + fprintf(fp, "subsys: %-20s cgroup: %s\n", subsys_name, cgroup_path);
> +
> + FREEBUF(cgroup_path);
> +}
> +
> +
> +void show_proc_cgroups(ulong task_ctx) {
> + int en_subsys_cnt;
> + int i;
> + ulong *cgroup_subsys_arr;
> + ulong subsys_base_ptr;
> + ulong cgroups_subsys_ptr = 0;
> +
> +
> + /* Get address of task_struct->cgroups */
> + readmem(task_ctx + MEMBER_OFFSET("task_struct", "cgroups"),
> + KVADDR, &cgroups_subsys_ptr, sizeof(void *),
> + "task_struct->cgroups", FAULT_ON_ERROR);
> +
> + subsys_base_ptr = cgroups_subsys_ptr + MEMBER_OFFSET("css_set",
> "subsys");
> + en_subsys_cnt = MEMBER_SIZE("css_set", "subsys") /
sizeof(void *);
> + cgroup_subsys_arr = (ulong *)GETBUF(en_subsys_cnt * sizeof(ulong));
> +
> + /* Get the contents of the css_set->subsys array */
> + readmem(subsys_base_ptr, KVADDR, cgroup_subsys_arr, sizeof(ulong) *
> en_subsys_cnt,
> + "css_set->subsys", FAULT_ON_ERROR);
> +
> + for (i = 0; i < en_subsys_cnt; i++) {
> + ulong cgroup;
> +
> + /* Get cgroup_subsys_state -> cgroup */
> + readmem(cgroup_subsys_arr[i] +
MEMBER_OFFSET("cgroup_subsys_state",
> "cgroup"),
> + KVADDR, &cgroup, sizeof(void *),
> "cgroup_subsys_state->cgroup", FAULT_ON_ERROR);
> +
> + get_cgroup_name(cgroup, cgroup_subsys_arr[i]);
> + }
> +
> + FREEBUF(cgroup_subsys_arr);
> +}
> +
> +
> +static void showcgrp(void) {
> +
> + ulong value;
> + struct task_context *tc;
> + ulong task_struct_ptr = 0;
> +
> + while (args[++optind]) {
> + if (IS_A_NUMBER(args[optind])) {
> + switch (str_to_context(args[optind], &value, &tc))
> + {
> + case STR_PID:
> + task_struct_ptr = tc->task;
> + ++optind;
> + break;
> +
> + case STR_TASK:
> + task_struct_ptr = value;
> + ++optind;
> + break;
> +
> + case STR_INVALID:
> + error(FATAL, "invalid task or pid value: %s\n\n",
> + args[optind]);
> + break;
> + }
> + } else {
> + if (argcnt > 1)
> + error(FATAL, "invalid task or pid value:
> %s\n",args[optind]);
> + else
> + break;
> + }
> + }
> +
> + if (!task_struct_ptr) {
> + task_struct_ptr = CURRENT_TASK();
> + }
> +
> + show_proc_cgroups(task_struct_ptr);
> +}
> +
> +char *help_proc_cgroups[] = {
> + "showcg",
> + "Show which cgroups is a process member of",
> + " [task | pid]",
> +
> + " This command prints the cgroup for each subsys that a process is a
> member of",
> + "\nExample",
> + " Show the cgroup for the currently active process:\n",
> + " crash> showcg",
> + " subsys: cpuset cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> + " subsys: cpu cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> + " subsys: cpuacct cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> + " subsys: blkio cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> + " subsys: memory cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> + " subsys: devices cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> + " subsys: freezer cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> + " subsys: net_cls cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> + " subsys: perf_event cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> + " subsys: net_prio cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> + " subsys: hugetlb cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> + "\n Alternatively you can pass either a pid or a task pointer to
> show the cgroup the",
> + " respective process is a member of e.g:\n",
> + " crash> showcg 1064\n OR",
> + " crash> showcg ffff880405711b80",
> +
> +
> +
> + NULL
> +};
> +
> +
> --
> 2.5.0
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
>
https://www.redhat.com/mailman/listinfo/crash-utility
>