----- Original Message -----
Hi Dave
I would like to discuss the usage of FAULT_ON_ERROR in readmem calls.
I have now seen a number of situations where this prevents Crash to
produce appropriate results when some memory is corrupt.
The last problem I saw a few days ago was in kernel.c, in function
dumplog
readmem(log_buf, KVADDR, buf, log_buf_len, "log_buf contents", FAULT_ON_ERROR)
The problem was that log_buf_len contained a very large value (memory
overwrite?) so the readmem failed due to the size. This means of
course that it was not possible to print the log, but as this
function is called during Crash startup it also had the consequence
that Crash terminated during startup. By just changing
FAULT_ON_ERROR to RETURN_ON_ERROR and perform a return if the
readmem failed I could use Crash to investigate this vmcore file,
except for printing the log.
Right -- in fact for the new Linux 3.5 variable length record log buffer
format, I do use RETURN_ON_ERROR. But the older format that you reference
can be changed to RETURN_ON_ERROR as well.
A second place where I have made some patches in Crash is in
function
arm_uvtop (arm.c). In the readmem calls in this function I have
changed FAULT_ON_ERROR to RETURN_ON_ERROR and just made a "return
FALSE;" if the readmem fails. Unfortunately I do not remember the
details why I made this change, but I think there were a case where
Crash terminated during startup and with these changes it was
possible to investigate the vmcore file.
Right, from time to time when these situations come up, they get handled
on a case-by-case basis *if* it's possible to safely continue.
Another situation I have seen is in help functions like
fill_vma_cache and fill_file_cache. When I use these functions in
extensions the commands will fail and terminate immediately if a
readmem call fails. In several cases I could easily handle such a
failure and the command could still produce a lot of relevant
results.
Right -- but if a legitimate vm_area_struct or file struct address is
unreadable, then something is clearly wrong with the dumpfile.
If your extension module has the capability of passing a bogus
vm_area_struct or file structure address, then perhaps you should
call "accessible(vaddr)" first? Or perhaps you're calling some other
function that in turn calls one of them? If that's true, then you
should definitely use accessible() first...
In the plugins I write I use RETURN_ON_ERROR in principle everywhere
and of course then handle the error situations myself. I have done
this to avoid situations as the ones described above.
As you should...
I am not asking you to remove most usage of FAULT_ON_ERROR, as I
realize the size and risks with such changes. However I would like
to bring up this question and hear your views. When working with
vmcore files with minor memory corruptions, using FAULT_ON_ERROR
will limit the usability of Crash.
As I mentioned above, the FAULT_ON_ERROR cases are meant to protect
you from continuing down a path which is doomed. But certainly
in cases where the session can be continued with confidence,
especially during initialization where FAULT_ON_ERRORS kill
the session, then those cases should be addressed.
On the other hand, making wholesale changes to handle "minor memory
corruptions" is dangerous. In fact, what is a "minor memory corruption"?
If the crash utility gets tripped up because the kernel has corrupted
its own memory, then you could argue that crash is doing its job.
But again, I would certainly consider any changes to RETURN_ON_ERROR
on a case-by-case basis. The cmd_log() example is a good one -- I'll
fix that for crash-6.1.0.
Thanks,
Dave