Similar to "libdrgn: make memory reader pluggable with callbacks", we
want to support custom type indexes (imagine, e.g., using drgn to parse
a binary format). For now, this disables the dwarf index tests; we'll
have a better way to test them later, so let's not bother adding more
test scaffolding.
I've been planning to make memory readers pluggable (in order to support
use cases like, e.g., reading a core file over the network), but the
C-style "inheritance" drgn uses internally is awkward as a library
interface; it's much easier to just register a callback. This change
effectively makes drgn_memory_reader a mapping from a memory range to an
arbitrary callback. As a bonus, this means that read callbacks can be
mixed and matched; a part of memory can be in a core file, another part
can be in the executable file, and another part could be filled from an
arbitrary buffer.
Currently, we deduplicate files for userspace mappings manually.
However, to prepare for adding symbol files at runtime, move the
deduplication to DWARF index. In the future, we probably want to
deduplicate based on build ID, as well.
Relocations are only supposed to be applied to ET_REL files, not ET_EXEC
files like vmlinux. This hasn't been an issue with the kernel builds
that I've tested on because the relocations match the contents of the
section. However, on Fedora, the relocation sections don't match,
probably because they post-process the binary in some way. This leads to
completely bogus debug information being parsed by drgn_dwarf_index. Fix
it by only relocating ET_REL files.
We need to set the value after we've reinitialized the object, otherwise
drgn_object_deinit() may try to free a buffer that we've already
overwritten. This also adds a test which triggers the crash.
There's a bug that we don't allow comparisons between void * and other
pointer types, so let's fix it by allowing all pointer comparisons
regardless of the referenced type. Although this isn't valid by the C
standard, GCC and Clang both allow it by default (with a warning).
drgn has pretty thorough in-program documentation, but it doesn't have a
nice overview or introduction to the basic concepts. This commit adds
that using Sphinx. In order to avoid documenting everything in two
places, the libdrgn bindings have their docstrings generated from the
API documentation. The alternative would be to use Sphinx's autodoc
extension, but that's not as flexible and would also require building
the extension to build the docs. The documentation for the helpers is
generated using autodoc and a small custom extension.
I went back and forth on using setuptools or autotools for the Python
extension, but I eventually settled on using only setuptools after
fighting to get the two to integrate well. However, setuptools is kind
of crappy; for one, it rebuilds every source file when rebuilding the
extension, which is really annoying for development. automake is a
better designed build system overall, so let's use that for the
extension. We override the build_ext command to build using autotools
and copy things where setuptools expects them.
The declaration file name of a DIE depends on the compilation directory,
which may not always be what the user expects. Instead, make the search
match as long as the full declaration file name ends with the given file
name. This is more convenient and more intuitive.
Running tests with Clang's AddressSanitizer fails with "runtime error:
index 1 out of bounds for type 'struct drgn_type_member [0]'". Zero
length arrays are a GCC extension and aren't buying us much anyways, so
just add a helper function that gets the array payload using pointer
arithmetic.
Older versions of Clang generate a call to __muloti4() for
__builtin_mul_overflow() with mixed signed and unsigned types. However,
Clang doesn't link to compiler-rt by default. Work around it by making
all of our calls to __builtin_mul_overflow() use unsigned types only.
1: https://bugs.llvm.org/show_bug.cgi?id=16404
The current mixed Python/C implementation works well, but it has a
couple of important limitations:
- It's too slow for some common use cases, like iterating over large
data structures.
- It can't be reused in utilities written in other languages.
This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:
- Types are now represented by a single Type class rather than the messy
polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
functions.
The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.
Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.