On Thu, Mar 30, 2017 at 03:13:27PM -0400, Dave Anderson wrote:
> If we hit a NULL next pointer on a struct list_head list it
means the
> list is corrupted.
Yeah, that is true -- although it's always been this way and there's never
been a bug report. I'm curious as to what happened in your case where
you discovered this?
I received a dump where the kernel had crashed while iterating through a
linked list. We read out the list in crash and it was certainly
corrupted:
crash-arm> list -xH 0x7f047394
be44f544
8ff69004
8ff69f04
8ff693c4
8ffe0e44
8ffe0244
...
8ffb53c4
be448b44
8fc2c3c4
8fc2c0c4
8fc2c604
8fc2cf04
ffff
list: invalid kernel virtual address: ffff type: "list entry"
crash-arm>
Further investigation led us to suspect that this was not a simple case
of a freed element still being on the list, but some other larger memory
corruption. We wanted to find out if there were more corrupted entries
on this list, so we dumped the list in reverse using the .prev pointers:
crash-arm> list -rxH 0x7f047394
b957fcc4
b957f0c4
b4d863c4
bad41904
bad416c4
bad41c04
bad41784
bad41544
be5f4b44
...
8ff7de44
8f7a4c04
8f7a4f04
8fc2c9c4
8fc2c904
8fc2c784
8fc2ce44
crash-arm>
This, suprisingly, terminated succesfully.
However, a closer look at the addresses showed that the last elements of
the reverse iteration are not the first elements of the forward
iteration. So crash had silently stopped iteration halfway into the
list. This was because the 8fc2ce44 element had a NULL prev pointer.
crash-arm> struct list_head 8fc2ce44
struct list_head {
next = 0xffffffff,
prev = 0x0
}
Since crash knows that the list is corrupted, it would seem appropriate
for it to alert the user to this fact instead of silently and
succesfully terminating the iteration.