setuptools has a long-standing bug that if files are removed from the
list of sources but were included in a previous run of egg_info, they
remain in the generated list of sources (pypa/setuptools#436). This
affects egg_info and sdist. Let's work around this by removing the old
SOURCES.txt if we can recreate it from git.
In order to retrieve registers from stack traces, we need to know what
registers are defined for a platform. This adds a small DSL for defining
registers for an architecture. The DSL is parsed by an awk script that
generates the necessary tables, lookup functions, and enum definitions.
I was always doing the copy based on a comment in the equivalent
setuptools code [1]:
# Always copy, even if source is older than destination, to ensure
# that the right extensions for the current Python/platform are
# used.
On closer inspection, this isn't relevant as of PEP 3149 [2], since
extensions are tagged with the ABI version. Let's avoid the unnecessary
copies.
While we're here, let's restore self.inplace just to be safe.
1: e00f4a87ad
2: https://www.python.org/dev/peps/pep-3149/
Now that we have the bundled version of elfutils, build it from libdrgn
and link to it. We can also get rid of the elfutils version checks from
the libdrgn code.
Currently, we have a special Makefile target to output the files for a
libdrgn source tarball, and we use that for setuptools. However, the
next change is going to import elfutils, and it'd be a pain to add the
same thing for the elfutils sources. Instead, let's just use git
ls-files for everything. The only difference is that source
distributions won't have the autoconf/automake output.
autoreconf may successfully run autoconf but not automake, so we should
check that Makefile.in exists, not configure. Additionally, there also
seem to be some cases where configure fails but Makefile is still
generated. Make sure we delete the potentially-broken output if
autoreconf/configure failed.
I went back and forth on using setuptools or autotools for the Python
extension, but I eventually settled on using only setuptools after
fighting to get the two to integrate well. However, setuptools is kind
of crappy; for one, it rebuilds every source file when rebuilding the
extension, which is really annoying for development. automake is a
better designed build system overall, so let's use that for the
extension. We override the build_ext command to build using autotools
and copy things where setuptools expects them.
The current mixed Python/C implementation works well, but it has a
couple of important limitations:
- It's too slow for some common use cases, like iterating over large
data structures.
- It can't be reused in utilities written in other languages.
This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:
- Types are now represented by a single Type class rather than the messy
polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
functions.
The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.
Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.
CompoundType.members() currently returns a list of member names;
sometimes, we actually want the type and offset. So, rename members() to
member_names(), and make members() return the type and offset, using a
newly added version of functools.partial() that caches the return value.
A couple of tricky things to handle:
- The DIE hash needs to be thread-safe. This implements a lock-free
algorithm.
- We need to make sure exceptions are only raised with the GIL held.
This adds some hacky macro magic to avoid adding a bunch of
easy-to-forget boilerplate everywhere.
- We need to make sure the main thread reraises any exceptions thrown by
the worker threads.
Most of drgn.dwarf is not performance-sensitive, and the part that is
(DwarfIndex) can use some extra tuning which is easier to do in C rather
than Cython.
The lldwarf/drgn.dwarf split wasn't working out too well, and moving all
of drgn.dwarf into lldwarf (by rewriting it into C) would be way too
much work. Instead, use Cython, which results in a parser which is just
as fast but with much cleaner code overall. It also turns out lldwarf
wasn't doing GC right, so the switch also fixed that.
Resolving parameters, variables with function scope, and global
variables should work. This is just the variable resolution, no fetching
yet, but a bunch of refactors snuck in here so committing it all now.
I wrote all of this code a few months back and am just now getting
around to committing it. The low-level DWARF parsing library is pretty
solid, although it only implements a subset of DWARF so far. The CLI and
higher-level interface are experimental.