[PATCH 0/3] Add support for TASK_IDLE and TASK_NEW task states
by Kazuhito Hagio
kernel commit 06eb61844d ("sched/debug: Add explicit TASK_IDLE
printing") exposed the TASK_IDLE task state to user space as
'I (idle)' state.
$ cat /proc/4/status
Name: kworker/0:0H
Umask: 0000
State: I (idle)
$ ps 4
PID TTY STAT TIME COMMAND
4 ? I< 0:00 [kworker/0:0H]
On the other hand, crash still shows 'UN' for TASK_IDLE state.
crash> ps 4
PID PPID CPU TASK ST %MEM VSZ RSS COMM
4 2 0 ffff8d1dbe884200 UN 0.0 0 0 [kworker/0:0H]
crash> ps -S
RU: 3
IN: 69
UN: 53
It is confusing for support folks, and 'foreach UN bt', which shows
useful information for troubles like system stall, includes unexpected
idle tasks. So let's print TASK_IDLE as 'ID' state. [Patch 2]
However, since Linux 4.14, kernel commit 20435d84e5 ("sched/debug:
Intruduce task_state_to_char() helper function") removed the 'stat_nam'
symbol, with which we've got the values of task state bitmasks. So
now we need to get them correctly by using 'task_state_array' again.
[Patch 1]
Additionally, kernel commit 7dc603c902 ("sched/fair: Fix PELT integrity
for new tasks") introduced TASK_NEW state. [Patch 3] adds support for
it as 'NE' state.
Kazuhito Hagio (3):
Fix task state bitmasks for 4.14 and later
Add support for TASK_IDLE task state
Add support for TASK_NEW task state
help.c | 4 ++--
task.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 54 insertions(+), 4 deletions(-)
--
1.8.3.1
6 years, 4 months
[PATCH] x86_64: Remove the unused x86_64_task_uses_5level()
by Dou Liyang
There's no way to enable paging mode on per-task basis. So, Check
for per-task is redundant. Remove the x86_64_task_uses_5level()
Signed-off-by: Dou Liyang <douly.fnst(a)cn.fujitsu.com>
---
x86_64.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/x86_64.c b/x86_64.c
index b07d6f2..96f685b 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -24,7 +24,6 @@ static int x86_64_uvtop(struct task_context *, ulong, physaddr_t *, int);
static int x86_64_uvtop_level4(struct task_context *, ulong, physaddr_t *, int);
static int x86_64_uvtop_level4_xen_wpt(struct task_context *, ulong, physaddr_t *, int);
static int x86_64_uvtop_level4_rhel4_xen_wpt(struct task_context *, ulong, physaddr_t *, int);
-static int x86_64_task_uses_5level(struct task_context *);
static ulong x86_64_vmalloc_start(void);
static int x86_64_is_task_addr(ulong);
static int x86_64_verify_symbol(const char *, ulong, char);
@@ -341,6 +340,7 @@ x86_64_init(int when)
if (l5_enabled)
machdep->flags |= VM_5LEVEL;
}
+
if (machdep->flags & VM_5LEVEL) {
machdep->machspec->userspace_top = USERSPACE_TOP_5LEVEL;
machdep->machspec->page_offset = PAGE_OFFSET_5LEVEL;
@@ -361,7 +361,6 @@ x86_64_init(int when)
machdep->uvtop = x86_64_uvtop_level4; /* 5-level is optional per-task */
machdep->kvbase = (ulong)PAGE_OFFSET;
machdep->identity_map_base = (ulong)PAGE_OFFSET;
-
}
/*
@@ -1915,7 +1914,7 @@ x86_64_uvtop_level4(struct task_context *tc, ulong uvaddr, physaddr_t *paddr, in
goto no_upage;
/* If the VM is in 5-level page table */
- if (machdep->flags & VM_5LEVEL && x86_64_task_uses_5level(tc)) {
+ if (machdep->flags & VM_5LEVEL) {
ulong p4d_pte;
/*
* p4d = p4d_offset(pgd, address);
@@ -1986,12 +1985,6 @@ no_upage:
return FALSE;
}
-static int
-x86_64_task_uses_5level(struct task_context *tc)
-{
- return FALSE;
-}
-
static int
x86_64_uvtop_level4_xen_wpt(struct task_context *tc, ulong uvaddr, physaddr_t *paddr, int verbose)
{
--
2.14.3
6 years, 4 months
Seek help about 5-level paging
by Dou Liyang
Dear Kirill,
Sorry to trouble you.
I am trying to make the Crash can parse the kernel with 5-level paging.
I met a problem, seek help.
IMO, all user-sapce tasks must using 5-level paging if the kernel
has been in 5-level paging decided by '__pgtable_l5_enabled=1'. Correct?
And, In the Documentation/x86/x86_64/5level-paging.txt:
...
To mitigate this, we are not going to allocate virtual address space
above 47-bit by default.
But userspace can ask for allocation from full address space by
specifying hint address (with or without MAP_FIXED) above 47-bits.
...
I guess it just means that some user-space tasks can't using the address
above 47-bits, it doesn't mean that the tasks will back to use 4-level
paging if the kernel has been in 5-level paging. Is it right?
Thanks,
dou
6 years, 4 months
[PATCH v2] x86_64: Make the conversion between 4level and 5level paging automatically
by Dou Liyang
Currently, Crash only enable support for kernel-only 5-level page tables by
entering the command line option "--machdep vm=5level". Since Linux 4.17,
the Linux kernel can be both 4level and 5level page tables. This command line
can't work well for this.
Using the "pgtable_l5_enabled" to detect whether the kernel proper for 5 level
page tables automatically. Also move the 5-level paging setup from machdep_init(PRE_GDB)
to machdep_init(POST_RELOC).
Signed-off-by: Dave Anderson <anderson(a)redhat.com>
Signed-off-by: Dou Liyang <douly.fnst(a)cn.fujitsu.com>
---
Changelog v1 --> v2
1. Make it support live systems suggested by Dave
2. Using __pgtable_l5_enabled for check.
3. Do some tests in both kdump(5level and 4level) and virsh dump(5level and 4level)
---
x86_64.c | 56 +++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 37 insertions(+), 19 deletions(-)
diff --git a/x86_64.c b/x86_64.c
index 6d1ae2f..07b6aa9 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -294,25 +294,6 @@ x86_64_init(int when)
machdep->machspec->pgdir_shift = PGDIR_SHIFT;
machdep->machspec->ptrs_per_pgd = PTRS_PER_PGD;
break;
-
- case VM_5LEVEL:
- machdep->machspec->userspace_top = USERSPACE_TOP_5LEVEL;
- machdep->machspec->page_offset = PAGE_OFFSET_5LEVEL;
- machdep->machspec->vmalloc_start_addr = VMALLOC_START_ADDR_5LEVEL;
- machdep->machspec->vmalloc_end = VMALLOC_END_5LEVEL;
- machdep->machspec->modules_vaddr = MODULES_VADDR_5LEVEL;
- machdep->machspec->modules_end = MODULES_END_5LEVEL;
- machdep->machspec->vmemmap_vaddr = VMEMMAP_VADDR_5LEVEL;
- machdep->machspec->vmemmap_end = VMEMMAP_END_5LEVEL;
- if (symbol_exists("vmemmap_populate"))
- machdep->flags |= VMEMMAP;
- machdep->machspec->physical_mask_shift = __PHYSICAL_MASK_SHIFT_5LEVEL;
- machdep->machspec->pgdir_shift = PGDIR_SHIFT_5LEVEL;
- machdep->machspec->ptrs_per_pgd = PTRS_PER_PGD_5LEVEL;
- if ((machdep->machspec->p4d = (char *)malloc(PAGESIZE())) == NULL)
- error(FATAL, "cannot malloc p4d space.");
- machdep->machspec->last_p4d_read = 0;
- machdep->uvtop = x86_64_uvtop_level4; /* 5-level is optional per-task */
}
machdep->kvbase = (ulong)PAGE_OFFSET;
machdep->identity_map_base = (ulong)PAGE_OFFSET;
@@ -346,6 +327,43 @@ x86_64_init(int when)
break;
case POST_RELOC:
+ /* Check for 5-level paging */
+ if (!(machdep->flags & VM_5LEVEL)) {
+ int l5_enabled;
+ if ((string = pc->read_vmcoreinfo("NUMBER(pgtable_l5_enabled)"))) {
+ l5_enabled = 1;
+ free(string);
+ } else if (kernel_symbol_exists("__pgtable_l5_enabled"))
+ readmem(symbol_value("__pgtable_l5_enabled"), KVADDR,
+ &l5_enabled, sizeof(int), "__pgtable_l5_enabled",
+ FAULT_ON_ERROR);
+
+ if (l5_enabled)
+ machdep->flags |= VM_5LEVEL;
+ }
+ if (machdep->flags & VM_5LEVEL) {
+ machdep->machspec->userspace_top = USERSPACE_TOP_5LEVEL;
+ machdep->machspec->page_offset = PAGE_OFFSET_5LEVEL;
+ machdep->machspec->vmalloc_start_addr = VMALLOC_START_ADDR_5LEVEL;
+ machdep->machspec->vmalloc_end = VMALLOC_END_5LEVEL;
+ machdep->machspec->modules_vaddr = MODULES_VADDR_5LEVEL;
+ machdep->machspec->modules_end = MODULES_END_5LEVEL;
+ machdep->machspec->vmemmap_vaddr = VMEMMAP_VADDR_5LEVEL;
+ machdep->machspec->vmemmap_end = VMEMMAP_END_5LEVEL;
+ if (symbol_exists("vmemmap_populate"))
+ machdep->flags |= VMEMMAP;
+ machdep->machspec->physical_mask_shift = __PHYSICAL_MASK_SHIFT_5LEVEL;
+ machdep->machspec->pgdir_shift = PGDIR_SHIFT_5LEVEL;
+ machdep->machspec->ptrs_per_pgd = PTRS_PER_PGD_5LEVEL;
+ if ((machdep->machspec->p4d = (char *)malloc(PAGESIZE())) == NULL)
+ error(FATAL, "cannot malloc p4d space.");
+ machdep->machspec->last_p4d_read = 0;
+ machdep->uvtop = x86_64_uvtop_level4; /* 5-level is optional per-task */
+ machdep->kvbase = (ulong)PAGE_OFFSET;
+ machdep->identity_map_base = (ulong)PAGE_OFFSET;
+
+ }
+
/*
* Check for CONFIG_RANDOMIZE_MEMORY, and set page_offset here.
* The remainder of the virtual address range setups will get
--
2.14.3
6 years, 4 months
[PATCH 0/5] Add Brent algorithm as an option for the 'list' command
by Dave Wysochanski
The list command by default uses a hash table for loop detection, and thus
uses an increasing amount of memory as each list entry is traversed. For
larger lists, this can be a significant problem. We have even seen where
crash commands run for days iterating lists, mostly due to the fact that
the ever-increasing memory causes crash to slow down. There is an option
to avoid the overhead of the hash table, "hash off", but then you lose the
ability to detect a loop.
This patchset adds an alternative algorithm for loop detection while only
using a fixed amount of memory. The '-B' option is added to the 'list'
command which invokes this new algorithm rather than the hash table.
In addition to the low memory usage, the output of the list command is
slightly different when a loop is detected. In addition to printing
the first duplicate, the length of the loop and the distance to the
loops is output.
Dave Wysochanski (6):
Add do_list_no_hash() function similar to do_list() but without hash.
do_list_no_hash: factor out all the debug statements at entry into
do_list_debug_entry
do_list_no_hash: factor out structure output into static function
do_list_no_hash: factor out a small readmem function
Implement R. P. Brent's algorithm for loop detection for 'list'
command.
Add a '-B' flag to the list command to call the brent algorithm.
defs.h | 2 +
help.c | 6 ++
tools.c | 288 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 292 insertions(+), 4 deletions(-)
--
1.8.3.1
6 years, 4 months
[PATCH] x86_64: Make the conversion between 4level and 5level paging automatically
by Dou Liyang
Currently, Crash only enable support for kernel-only 5-level page tables by
entering the command line option "--machdep vm=5level". Since Linux 4.17,
the Linux kernel can be both 4level and 5level page tables. This command line
can't work well for this.
Using the "pgtable_l5_enabled" got from vmcore to detect whether the kernel
proper for 5 level page tables automatically.
Signed-off-by: Dou Liyang <douly.fnst(a)cn.fujitsu.com>
---
x86_64.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/x86_64.c b/x86_64.c
index 6d1ae2f..be6164b 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -203,6 +203,10 @@ x86_64_init(int when)
machdep->machspec->kernel_image_size = dtol(string, QUIET, NULL);
free(string);
}
+ if ((string = pc->read_vmcoreinfo("NUMBER(pgtable_l5_enabled)"))) {
+ machdep->flags |= VM_5LEVEL;
+ free(string);
+ }
if (SADUMP_DUMPFILE() || QEMU_MEM_DUMP_NO_VMCOREINFO() ||
VMSS_DUMPFILE())
/* Need for calculation of kaslr_offset and phys_base */
--
2.14.3
6 years, 4 months
Crash-utility issue with MIPS
by Mozes, Rachel
Hi all,
I'm an Intel employee, having a MIPS Platform running Linux OS.
I enabled the kexec\kdump flow in Linux version we use and now I have vmcore memory dump, and now I'm getting into troubles....
I'm running the crash utility compiled with "target=MIPS" and getting the output:
crash: read error: kernel virtual address: xxxx30fc type: "possible"
WARNING: cannot read cpu_possible_map
crash: read error: kernel virtual address: xxxx30f4 type: "present"
WARNING: cannot read cpu_present_map
crash: read error: kernel virtual address: xxxx30f8 type: "online"
WARNING: cannot read cpu_online_map
crash: read error: kernel virtual address: xxxx30f0 type: "active"
WARNING: cannot read cpu_active_map
crash: read error: kernel virtual address:xxxx0a48 type: "shadow_timekeeper xtime_sec"
crash: read error: kernel virtual address: xxxxa444 type: "init_uts_ns"
crash: linux/vmlinux and vmcore do not match!
As you see the tool claims that the vmlinux doesn't match to the vmcore although the vmcore captured from the given vmlinux.
When I open the vmcore with gdb from our toolchain provided by MIPS, it's successfully opened and I see that the init_uts_ns is in offset xxxxa440 instead of xxxxa444 it looks for in the crash utility.
I thought that maybe the issue is we use a propriety toolchain provided by Codescape (the next link) and tried to replace the gdb packet in the Makefile - but then the tool wasn't compile, it fails in the gdb_merge step:
http://codescape-mips-sdk.imgtec.com/components/toolchain/2017.10-05/down...
Can you help us to understand how to overcome this issue? is there any way to replace the gdb binary that the tool uses?
We'll very appreciate your help.
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
6 years, 4 months
crash-7.3.2 very long list iteration progressively increasing memory usage
by David Wysochanski
Hi Dave,
We have a fairly large vmcore (around 250GB) that has a very long kmem
cache we are trying to determine whether a loop exists in it. The list
has literally billions of entries. Before you roll your eyes hear me
out.
Just running the following command
crash> list -H 0xffff8ac03c81fc28 > list-yeller.txt
Seems to increase the memory of crash usage over time very
significantly, to the point that we have the following with top output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25522 dwysocha 20 0 11.2g 10g 5228 R 97.8 17.5 1106:34 crash
When I started the
command yesterday it was adding around 4 million entries to the file
per minute. At the time I estimated the command would finish in around
10 hours and I could use it to determine if there was a loop in the
list or not. But today has slowed down to less than 1/10th that, to
around 300k entries per minute.
Is this type of memory usage with list enumeration expected or not?
I have not yet begun to delve into the code, but figured you might have
a gut feel whether this is expected and fixable or not.
Thanks.
6 years, 4 months