Also:
* Rename struct string to struct nstring and move it to its own header.
* Fix scripts/iwyu.py, which was broken by commit 5541fad063 ("Fix
some flake8 errors").
* Add workarounds for a few outstanding include-what-you-use issues.
There is still a false positive for
include-what-you-use/include-what-you-use#970, but hopefully that is
fixed soon.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Mainly unused imports, unused variables, unnecessary f-strings, and
regex literals missing the r prefix. I'm not adding it to the CI linter
because it's too noisy, though.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The elfutils DWARF unwinder has a couple of limitations:
1. libdwfl doesn't have an interface for getting register values, so we
have to bundle a patched version of elfutils with drgn.
2. Error handling is very awkward: dwfl_getthread_frames() can return an
error even on success, so we have to squirrel away our own errors in
the callback.
Furthermore, there are a couple of things that will be easier with our
own unwinder:
1. Integrating unwinding using ORC will be easier when we're handling
unwinding ourselves.
2. Support for local variables isn't too far away now that we have DWARF
expression evaluation.
Now that we have the register state, CFI, and DWARF expression pieces in
place, stitch them together with the new unwinder, and tweak the public
API a bit to reflect it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
libdwfl stores registers in an array of uint64_t indexed by the DWARF
register number. This is suboptimal for a couple of reasons:
1. Although the DWARF specification states that registers should be
numbered for "optimal density", in practice this isn't the case. ABIs
include unused ranges of numbers and don't order registers based on
how likely they are to be known (e.g., caller-saved registers usually
aren't recovered while unwinding the stack, but they are often
numbered before callee-saved registers).
2. This precludes support for registers larger than 64 bits, like SSE
registers.
For our own unwinder, we want to store registers in an
architecture-specific format to solve both of these problems.
So, have each architecture define its layout with registers arranged for
space efficiency and convenience when parsing saved registers from core
dumps. Instead of generating an arch_foo.c file from arch_foo.c.in,
separately define the logical register order in an arch_foo.defs file,
and use it to generate an arch_foo.inc file that is included from
arch_foo.c. The layout is defined as a macro in arch_foo.c. While we're
here, drop some register definitions that aren't useful at the moment.
Then, define struct drgn_register_state to efficiently store registers
in the defined format.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
enum drgn_register_number in the public libdrgn API and
drgn.Register.number in the Python bindings are basically exports of
DWARF register numbers. They only exist as a way to identify registers
that's lighter weight than string lookups. libdrgn already has struct
drgn_register, so we can use that to identify registers in the public
API and remove enum drgn_register_number. This has a couple of benefits:
we don't depend on DWARF numbering in our API, and we don't have to
generate drgn.h from the architecture files. The Python bindings can
just use string names for now. If it seems useful, StackFrame.register()
can take a Register in the future, we'll just need to be careful to not
allow Registers from the wrong platform.
While we're changing the API anyways, also change it so that registers
have a list of names instead of one name. This isn't needed for x86-64
at the moment, but will be for architectures that have multiple names
for the same register (like ARM).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I originally did it this way because pydoc doesn't handle non-trivial
defaults in signature very well (see commit 67a16a09b8 ("tests: test
that Python documentation renders")). drgndoc doesn't generate signature
for pydoc anymore, though, so we don't need to worry about it and can
clean up the typing.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
I've been wanting to add type hints for the _drgn C extension for
awhile. The main blocker was that there is a large overlap between the
documentation (in docs/api_reference.rst) and the stub file, and I
really didn't want to duplicate the information. Therefore, it was a
requirement that the the documentation could be generated from the stub
file, or vice versa. Unfortunately, none of the existing tools that I
could find supported this very well. So, I bit the bullet and wrote my
own Sphinx extension that uses the stub file as the source of truth (and
subsumes my old autopackage extension and gen_docstrings script).
The stub file is probably incomplete/inaccurate in places, but this
should be a good starting point to improve on.
Closes#22.
Currently the drgn version number is defined in drgn.h.in, and configure
and setup.py both parse it out of there. However, now that we're
generating drgn.h anyways, it's easier to make configure.ac the source
of truth.
In order to retrieve registers from stack traces, we need to know what
registers are defined for a platform. This adds a small DSL for defining
registers for an architecture. The DSL is parsed by an awk script that
generates the necessary tables, lookup functions, and enum definitions.
I didn't want to use BUILT_SOURCES before because that would break make
$TARGET. But, now that doesn't work anyways because we're using SUBDIRS,
so we might as well use BUILT_SOURCES.
For stack trace support, we'll need to have some architecture-specific
functionality. drgn's current notion of an architecture doesn't actually
include the instruction set architecture. This change expands it to a
"platform", which includes the ISA as well as the existing flags.
Currently, programs can be created for three main use-cases: core dumps,
the running kernel, and a running process. However, internally, the
program memory, types, and symbols are pluggable. Expose that as a
callback API, which makes it possible to use drgn in much more creative
ways.
drgn has pretty thorough in-program documentation, but it doesn't have a
nice overview or introduction to the basic concepts. This commit adds
that using Sphinx. In order to avoid documenting everything in two
places, the libdrgn bindings have their docstrings generated from the
API documentation. The alternative would be to use Sphinx's autodoc
extension, but that's not as flexible and would also require building
the extension to build the docs. The documentation for the helpers is
generated using autodoc and a small custom extension.
I went back and forth on using setuptools or autotools for the Python
extension, but I eventually settled on using only setuptools after
fighting to get the two to integrate well. However, setuptools is kind
of crappy; for one, it rebuilds every source file when rebuilding the
extension, which is really annoying for development. automake is a
better designed build system overall, so let's use that for the
extension. We override the build_ext command to build using autotools
and copy things where setuptools expects them.
The current mixed Python/C implementation works well, but it has a
couple of important limitations:
- It's too slow for some common use cases, like iterating over large
data structures.
- It can't be reused in utilities written in other languages.
This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:
- Types are now represented by a single Type class rather than the messy
polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
functions.
The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.
Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.