The numeric_forward() function serves as the comparator for qsort() when
sorting kernel symbols. For a modern Linux kernel containing hundreds of
thousands of symbols (N), qsort() performs O(N log N) comparisons,
meaning this function is invoked millions of times during the startup
phase of the crash utility.
During these millions of comparisons, it frequently checks for specific
symbol names (e.g., "_stext", "kaslr_get_random_long") using the
STREQ()
macro. STREQ() internally expands to a full strcmp(), incurring
significant overhead.
By explicitly checking the first character of the symbol name
(e.g., x->name[0] == '_') before invoking STREQ(), we introduce a
lightweight "early reject" mechanism. Since the distribution of kernel
symbol starting characters is relatively sparse, this short-circuits the
evaluation for the vast majority of symbols, completely avoiding the
overhead of strcmp().
Additionally, since x->name could potentially be NULL, we must safely
guard the character access with an explicit non-null check (x->name &&)
to prevent segmentation faults.
Performance measurement (6 alternating runs, ./crash /proc/kcore
startup, compiled with -O2, RISC-V 32-core, kernel
6.12.13.bsk.1-rc18-riscv64, vmlinux ~700MB with debug info):
before after
mean (s) 46.007 40.387
median (s) 45.891 40.821
min (s) 38.792 32.832
max (s) 58.572 46.062
Speedup (mean): 1.14x (-5.62 s, -12.2%)
perf stat:
instructions: 1.826e11 -> 1.670e11 (-8.5%)
cycles: 1.475e11 -> 1.123e11 (-23.8%)
perf record (top hotspots):
strcmp: 18.00% -> 11.42%
string_exists: not present in either (inlined under -O2)
Note that string_exists is inlined under -O2 and does not appear in the
profile. However, the optimization still provides a measurable
improvement because the strcmp calls within STREQ remain expensive --
numeric_forward is called millions of times during qsort, and each STREQ
expands to a full strcmp. The first-character check acts as a cheap
early-reject that avoids strcmp for the vast majority of symbols whose
names don't start with the target character.
The benefit is amplified on kernels with large symbol tables. This
RISC-V kernel has ~793,000 symbols in /proc/kallsyms, resulting in
~15.5M comparisons during qsort (O(N log N)). Additionally, mapping
symbols ($a/$d/$x) and .L* local symbols are included in the qsort
but later filtered out by verify_symbol(), further increasing the
number of comparisons.
Signed-off-by: Rui Qi <qirui.001(a)bytedance.com>
---
Changes in V3:
- Update performance data to corrected -O2 (make warn) measurements
- Add explanation of RISC-V vs x86_64 scaling difference
Changes in V2:
- Add performance measurement data (time, perf stat, perf record)
symbols.c | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/symbols.c b/symbols.c
index c0285058c5cb..c0353d626477 100644
--- a/symbols.c
+++ b/symbols.c
@@ -14412,16 +14412,16 @@ numeric_forward(const void *P_x, const void *P_y)
error(FATAL, "bfd_minisymbol_to_symbol failed\n");
if (st->_stext_vmlinux == UNINITIALIZED) {
- if (STREQ(x->name, "_stext"))
+ if (x->name && x->name[0] == '_' && STREQ(x->name,
"_stext"))
st->_stext_vmlinux = valueof(x);
- else if (STREQ(y->name, "_stext"))
+ else if (y->name && y->name[0] == '_' &&
STREQ(y->name, "_stext"))
st->_stext_vmlinux = valueof(y);
}
if (kt->flags2 & KASLR_CHECK) {
- if (STREQ(x->name, "kaslr_get_random_long") ||
- STREQ(y->name, "kaslr_get_random_long") ||
- STREQ(x->name, "module_load_offset") ||
- STREQ(y->name, "module_load_offset")) {
+ if ((x->name && x->name[0] == 'k' && STREQ(x->name,
"kaslr_get_random_long")) ||
+ (y->name && y->name[0] == 'k' && STREQ(y->name,
"kaslr_get_random_long")) ||
+ (x->name && x->name[0] == 'm' && STREQ(x->name,
"module_load_offset")) ||
+ (y->name && y->name[0] == 'm' && STREQ(y->name,
"module_load_offset"))) {
kt->flags2 &= ~KASLR_CHECK;
kt->flags2 |= (RELOC_AUTO|KASLR);
}
@@ -14429,36 +14429,36 @@ numeric_forward(const void *P_x, const void *P_y)
if (SADUMP_DUMPFILE() || QEMU_MEM_DUMP_NO_VMCOREINFO() || VMSS_DUMPFILE()) {
/* Need for kaslr_offset and phys_base */
- if (STREQ(x->name, "divide_error") ||
- STREQ(x->name, "asm_exc_divide_error"))
+ if ((x->name && x->name[0] == 'd' && STREQ(x->name,
"divide_error")) ||
+ (x->name && x->name[0] == 'a' && STREQ(x->name,
"asm_exc_divide_error")))
st->divide_error_vmlinux = valueof(x);
- else if (STREQ(y->name, "divide_error") ||
- STREQ(y->name, "asm_exc_divide_error"))
+ else if ((y->name && y->name[0] == 'd' &&
STREQ(y->name, "divide_error")) ||
+ (y->name && y->name[0] == 'a' && STREQ(y->name,
"asm_exc_divide_error")))
st->divide_error_vmlinux = valueof(y);
- if (STREQ(x->name, "idt_table"))
+ if (x->name && x->name[0] == 'i' && STREQ(x->name,
"idt_table"))
st->idt_table_vmlinux = valueof(x);
- else if (STREQ(y->name, "idt_table"))
+ else if (y->name && y->name[0] == 'i' &&
STREQ(y->name, "idt_table"))
st->idt_table_vmlinux = valueof(y);
- if (STREQ(x->name, "kaiser_init"))
+ if (x->name && x->name[0] == 'k' && STREQ(x->name,
"kaiser_init"))
st->kaiser_init_vmlinux = valueof(x);
- else if (STREQ(y->name, "kaiser_init"))
+ else if (y->name && y->name[0] == 'k' &&
STREQ(y->name, "kaiser_init"))
st->kaiser_init_vmlinux = valueof(y);
- if (STREQ(x->name, "linux_banner"))
+ if (x->name && x->name[0] == 'l' && STREQ(x->name,
"linux_banner"))
st->linux_banner_vmlinux = valueof(x);
- else if (STREQ(y->name, "linux_banner"))
+ else if (y->name && y->name[0] == 'l' &&
STREQ(y->name, "linux_banner"))
st->linux_banner_vmlinux = valueof(y);
- if (STREQ(x->name, "pti_init"))
+ if (x->name && x->name[0] == 'p' && STREQ(x->name,
"pti_init"))
st->pti_init_vmlinux = valueof(x);
- else if (STREQ(y->name, "pti_init"))
+ else if (y->name && y->name[0] == 'p' &&
STREQ(y->name, "pti_init"))
st->pti_init_vmlinux = valueof(y);
- if (STREQ(x->name, "saved_command_line"))
+ if (x->name && x->name[0] == 's' && STREQ(x->name,
"saved_command_line"))
st->saved_command_line_vmlinux = valueof(x);
- else if (STREQ(y->name, "saved_command_line"))
+ else if (y->name && y->name[0] == 's' &&
STREQ(y->name, "saved_command_line"))
st->saved_command_line_vmlinux = valueof(y);
}
--
2.47.3