Currently, we load debug information for every kernel module that we
find under /lib/modules/$(uname -r)/kernel. This has a few issues:
1. Distribution kernels have lots of modules (~3000 for Fedora and
Debian).
a) This can exceed the default soft limit on the number of open file
descriptors.
b) The mmap'd debug information can trip the overcommit heuristics
and cause OOM kills.
c) It can take a long time to parse all of the debug information.
2. Not all modules are under the "kernel" directory; some distros also
have an "extra" directory.
3. The user is not made aware of loaded kernel modules that don't have
debug information available.
So, instead of walking /lib/modules, walk the list of loaded kernel
modules and look up their debugging information.
The current mixed Python/C implementation works well, but it has a
couple of important limitations:
- It's too slow for some common use cases, like iterating over large
data structures.
- It can't be reused in utilities written in other languages.
This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:
- Types are now represented by a single Type class rather than the messy
polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
functions.
The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.
Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.