I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.
Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.
So, let's reimagine types as being owned by a program. This has a few
parts:
1. In libdrgn, simple types are now created by factory functions,
drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
Program class.
5. While we're changing the API, the parameters to pointer_type() and
array_type() are reordered to be more logical (and to allow
pointer_type() to take a default size of None for the program's
default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
argument only.
A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.
This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
DRGN_UNREACHABLE() currently expands to abort(), but assert() provides
more information. If NDEBUG is defined, we can use
__builtin_unreachable() instead.
DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to
UNREACHABLE(). It's also not really related to errors, so this moves it
to internal.h.
For C++ support, we need to add an array of template parameters to
struct drgn_type. struct drgn_type already has arrays for members,
enumerators, and parameters embedded at the end of the structure,
because no type needs more than one of those. However, struct, union,
and class types may need members and template parameters. We could add a
separate array of templates, but then it gets confusing having two
methods of storing arrays in struct drgn_type. Let's make these arrays
separate instead of embedding them.
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
In preparation for making drgn_pretty_print_object() more flexible
(i.e., not always "pretty"), rename it to drgn_format_object(). For
consistency, let's rename drgn_pretty_print_type_name(),
drgn_pretty_print_type(), and drgn_pretty_print_stack_trace(), too.
Matt Ahrens reported that comparing two types would sometimes end up in
a seemingly infinite loop, which he discovered was because we repeat
comparisons of types as long as they're not in a cycle. Fix it by
caching all comparisons during a call.
This implements the first step at supporting C++: class types. In
particular, this adds a new drgn_type_kind, DRGN_TYPE_CLASS, and support
for parsing DW_TAG_class_type from DWARF. Although classes are not valid
in C, this adds support for pretty printing them, for completeness.
This makes several improvements to the hash table API.
The first two changes make things more general in order to be consistent
with the upcoming binary search tree API:
- Items are renamed to entries.
- Positions are renamed to iterators.
- hash_table_empty() is added.
One change makes the definition API more convenient:
- It is no longer necessary to pass the types into
DEFINE_HASH_{MAP,SET}_FUNCTIONS().
A few changes take some good ideas from the C++ STL:
- hash_table_insert() now fails on duplicates instead of overwriting.
- hash_table_delete_iterator() returns the next iterator.
- hash_table_next() returns an iterator instead of modifying it.
One change reduces memory usage:
- The lower-level DEFINE_HASH_TABLE() is cleaned up and exposed as an
alternative to DEFINE_HASH_MAP() and DEFINE_HASH_SET(). This allows us
to get rid of the duplicated key where a hash map value already embeds
the key (the DWARF index file table) and gets rid of the need to make
a dummy hash set entry to do a search (the pointer and array type
caches).
Similar to "libdrgn: make memory reader pluggable with callbacks", we
want to support custom type indexes (imagine, e.g., using drgn to parse
a binary format). For now, this disables the dwarf index tests; we'll
have a better way to test them later, so let's not bother adding more
test scaffolding.
Older versions of Clang generate a call to __muloti4() for
__builtin_mul_overflow() with mixed signed and unsigned types. However,
Clang doesn't link to compiler-rt by default. Work around it by making
all of our calls to __builtin_mul_overflow() use unsigned types only.
1: https://bugs.llvm.org/show_bug.cgi?id=16404
The current mixed Python/C implementation works well, but it has a
couple of important limitations:
- It's too slow for some common use cases, like iterating over large
data structures.
- It can't be reused in utilities written in other languages.
This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:
- Types are now represented by a single Type class rather than the messy
polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
functions.
The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.
Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.