-----Original Message-----
Hello Kazu,
RTLD_DEEPBIND is not a good way. I'd like to recall the patch.
After my research, I find making functions and global variables static within extensions
can avoid such
problems.
Good find!
I also thought that RTLD_LOCAL could be a workaround, but crash changelog [1]
says:
- The dlopen() call used by the "extensions" facility has been changed
to use the RTLD_GLOBAL flag, so that symbols from an extension object
will be visable to subsequently loaded modules. (asid(a)hp.com)
(12/20/06)
and it looks like Pykdump needs the RTLD_GLOBAL [2], so it cannot be changed.
[1]
https://crash-utility.github.io/crash.changelog.html
[2]
https://listman.redhat.com/archives/crash-utility/2006-December/msg00030....
Taking extensions/trace.so as example, trace_fini and trace_init are non-static
functions, if we load
trace.so first, everything works fine, but when we rename trace.so to trace2.so and load
it again, we are
expecting trace_init within trace2.so to be called, but since we have RTLD_GLOBAL,
actually it's the
trace_init within trace.so to be called. It causes problems because currently some global
variables within
trace.so are not in their initial status.
As I quoting dlopen manual:
Symbol references in the shared object are resolved using (in order): symbols in the link
map of objects
loaded for the main program and its dependencies; symbols in shared objects (and their
dependencies) that
were previously opened with dlopen() using the RTLD_GLOBAL flag; and definitions in the
shared object itself
(and any dependencies that were loaded for that object).
When using readelf --dyn-syms trace.so, for non-static trace_fini and trace_init, I got:
Symbol table '.dynsym' contains 66 entries:
....
64: 0000000000007e65 22 FUNC GLOBAL DEFAULT 11 trace_fini
65: 0000000000007e3d 40 FUNC GLOBAL DEFAULT 11 trace_init
Since I'm not an expert to linkers and loaders, I guess the process is:
1. when loading trace.so
2. ld.so sees trace_init and trace_fini in .dynsym
3. ld.so searches the 2 symbols in main program, RTLD_GLOBAL shared objects, and
trace.so.
4. ld.so finds them in trace.so, resolves their address.
5. everything works fine.
6. when loading trace2.so
7. ld.so sees trace_init and trace_fini in .dynsym
8. ld.so searches the 2 symbols in main program, RTLD_GLOBAL shared objects, and
trace2.so.
9. ld.so finds them in RTLD_GLOBAL shared objects(trace.so), resolves their address.
10. trace_init in trace.so is called again, but global variables are not in initial
status, fails.
If I make trace_fini and trace_init static, readelf --dyn-syms trace.so will no longer
give the 2 symbols
in .dynsym, and ld.so will no longer resolve them to the wrong place, then it works
fine.
I think this cannot be a code patch, but a document for future extension developers, if
you don't want to
export a symbol to subsequent extensions, making it static.
Yes, that would be kind to extension module developers. And it might also be
better to change the functions and global variables of echo.c to static, that
is a template and example for most extension modules.
Thanks,
Kazu
On Mon, Mar 22, 2021 at 1:35 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com
<mailto:k-hagio-ab@nec.com>
> wrote:
Hi Tao Liu,
-----Original Message-----
> If a same extension(Eg: extensions/trace.so) with two different names are loaded
by
> "extend" command twice, it sometimes segfaults crash.
>
> It's because crash uses RTLD_NOW|RTLD_GLOBAL flags of dlopen to load an
extension.
> RTDL_GLOBAL will make symbols defined by this shared object available for
> symbol resolution of subsequently loaded shared objects. So symbols with the same
> name will be exported from the former to the latter. In this case, when 2
extensions
> only differ from file names, the subsequently loaded extension will have
unexpected
> initial values for global varibles.
Thanks for the report.
> This patch adds RTLD_DEEPBIND flag to dlopen, making extensions using its
> own symbols preference to symbols with the same name contained by others.
This looks a big API change for crash extension modules.
As far as I've tested, getopt() in an extension module does not work well
with this patch:
# make extensions
crash> extend extensions/echo.so
./extensions/echo.so: shared object loaded
crash> echo test
test
crash> echo test
crash> echo test test2
test2
crash> echo test test2
crash> echo test
crash> echo test test2
crash> echo test test2 test3
test3
Can we fix this? And probably all other modules using getopt() imitates
this echo.c, they will also need to be fixed to adopt the patch.
Also I'm concerned that there might be another regression.
Do we need to fix the issue at these costs? or is there any better way?
Thanks,
Kazu
>
> Signed-off-by: Tao Liu <ltao(a)redhat.com <mailto:ltao@redhat.com> >
> ---
> extensions.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/extensions.c b/extensions.c
> index d23b1e3..e07f9a9 100644
> --- a/extensions.c
> +++ b/extensions.c
> @@ -317,7 +317,7 @@ load_extension(char *lib)
> * _init() function before dlopen() returns below.
> */
> pc->curext = ext;
> - ext->handle = dlopen(ext->filename, RTLD_NOW|RTLD_GLOBAL);
> + ext->handle = dlopen(ext->filename,
RTLD_NOW|RTLD_GLOBAL|RTLD_DEEPBIND);
>
> if (!ext->handle) {
> strcpy(buf, dlerror());
> --
> 2.29.2
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com <mailto:Crash-utility@redhat.com>
>
https://listman.redhat.com/mailman/listinfo/crash-utility
--
Crash-utility mailing list
Crash-utility(a)redhat.com <mailto:Crash-utility@redhat.com>
https://listman.redhat.com/mailman/listinfo/crash-utility