In the next change, we want to export languages to the public libdrgn
interface. I couldn't figure out any way to export array elements as
their own symbols. I'd also rather not export the drgn_languages array
indices as an enum because that would preclude ever having any sort of
language plugin support.
Instead, let's get rid of the drgn_languages array as it currently
exists and have separate drgn_language structures. This also allows us
to make a bunch of the C implementation functions static again. We keep
the language numbers so that we can store per-language data efficiently
(currently drgn_program::void_types and languages_py), as well as a
drgn_languages array to go from the language number to the struct
drgn_language. But, this is all internal and could be changed if we ever
support language plugins.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
DEFINE_HASH_TABLE_TYPE() doesn't actually need to know the key type.
Move that argument (and some of the derived constants) to
DEFINE_HASH_TABLE_FUNCTIONS(). This will allow recursive hash table
types. As a nice side effect, it also reduces the size of common header
files.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
And use it in a few appropriate places. This should hopefully make it
harder to make iteration mistakes like the one fixed by commit
4755cfac7c ("libdrgn: dwarf_index: increment correct variable when
rolling back"). While we're doing this, move ARRAY_SIZE() into a new
header file with array_for_each() and make it lowercase.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
It's undefined behavior to pass NULL to memcpy() even if the length is
zero. See also commit a17215e984 ("libdrgn: dwarf_index: fix memcpy()
undefined behavior").
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add low-level getters equivalent to the drgn_program platform-related
helpers and use them in places where we have checked or can assume that
the platform is known.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:
- Byte order is only meaningful for scalars. What does it mean for a
struct to be big endian? A struct doesn't have a most or least
significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
byte order (DW_AT_endianity). The only producer I could find that uses
this is GCC for the scalar_storage_order type attribute, and it only
uses it for base types, not variables. GDB only seems to use to check
it for base types, as well.
So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.
The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add struct drgn_type_template_parameter to libdrgn, the corresponding
TypeTemplateParameter to the Python bindings, and support for parsing
them from DWARF.
With this, support for templates is almost, but not quite, complete. The
main wart is that DW_TAG_name of compound types includes the template
parameters, so the type tag includes it as well. We should remove that
from the tag and instead have the type formatting code add it only when
getting the full type name.
Based on a patch from Jay Kamat.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Getting the bit field size of a member will soon require evaluating the
lazy type, so return it from drgn_member_type() instead of accessing it
directly.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Right now, an empty builder vector will not have anything to free, but
if we start pre-reserving these later, it will be a leak.
Fixes: c7af566c6e ("libdrgn: deduplicate all types with no members/parameters/enumerators")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Even if a compound, function, or enumerated type is complete, we can
still deduplicate it as long as it doesn't have members, parameters, or
enumerators.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The == operator on drgn.Type is only intended for testing. It's
expensive and slow and not what people usually want. It's going to get
even more awkward to define once types can refer to objects (for
template parameters and static members and such). Let's replace == with
a new identical() function only available in unit tests. Then, remove
the operator from the Python bindings as well as the underlying libdrgn
drgn_type_eq() and drgn_qualified_type_eq() functions.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
offsetof() can almost be implemented with Type.member(name).offset, but
that doesn't parse member designators. Add an offsetof() function that
does (and add drgn_type_offsetof() in libdrgn).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add drgn_type_has_member() to libdrgn and Type.has_member() to the
Python bindings. This can simplify some version checks, like the one in
_for_each_block_device() since commit 9a10a927b0 ("helpers: fix
for_each_{disk,partition}() on kernels >= v5.1").
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I'd like to use the name drgn_object_kind to distinguish between values
and references. "Encoding" is more accurate than "kind", anyways.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are several places where we'd like to enforce that every
enumeration is handled in a switch. Add SWITCH_ENUM() and
SWITCH_ENUM_DEFAULT() macros for that and use them.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Use *_hash_pair() for hash functions that do the full double hashing and
return a struct hash_pair and hash_*() for other hashing utility
functions. Also change some of the equality function names to be more
symmetric and improve the documentation.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn_type_members_eq() skips comparing the types of anonymous members.
Fix that and add a test for it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).
This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.
Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.
So, let's reimagine types as being owned by a program. This has a few
parts:
1. In libdrgn, simple types are now created by factory functions,
drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
Program class.
5. While we're changing the API, the parameters to pointer_type() and
array_type() are reordered to be more logical (and to allow
pointer_type() to take a default size of None for the program's
default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
argument only.
A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.
This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
DRGN_UNREACHABLE() currently expands to abort(), but assert() provides
more information. If NDEBUG is defined, we can use
__builtin_unreachable() instead.
DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to
UNREACHABLE(). It's also not really related to errors, so this moves it
to internal.h.
For C++ support, we need to add an array of template parameters to
struct drgn_type. struct drgn_type already has arrays for members,
enumerators, and parameters embedded at the end of the structure,
because no type needs more than one of those. However, struct, union,
and class types may need members and template parameters. We could add a
separate array of templates, but then it gets confusing having two
methods of storing arrays in struct drgn_type. Let's make these arrays
separate instead of embedding them.
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
In preparation for making drgn_pretty_print_object() more flexible
(i.e., not always "pretty"), rename it to drgn_format_object(). For
consistency, let's rename drgn_pretty_print_type_name(),
drgn_pretty_print_type(), and drgn_pretty_print_stack_trace(), too.
Matt Ahrens reported that comparing two types would sometimes end up in
a seemingly infinite loop, which he discovered was because we repeat
comparisons of types as long as they're not in a cycle. Fix it by
caching all comparisons during a call.
This implements the first step at supporting C++: class types. In
particular, this adds a new drgn_type_kind, DRGN_TYPE_CLASS, and support
for parsing DW_TAG_class_type from DWARF. Although classes are not valid
in C, this adds support for pretty printing them, for completeness.
This makes several improvements to the hash table API.
The first two changes make things more general in order to be consistent
with the upcoming binary search tree API:
- Items are renamed to entries.
- Positions are renamed to iterators.
- hash_table_empty() is added.
One change makes the definition API more convenient:
- It is no longer necessary to pass the types into
DEFINE_HASH_{MAP,SET}_FUNCTIONS().
A few changes take some good ideas from the C++ STL:
- hash_table_insert() now fails on duplicates instead of overwriting.
- hash_table_delete_iterator() returns the next iterator.
- hash_table_next() returns an iterator instead of modifying it.
One change reduces memory usage:
- The lower-level DEFINE_HASH_TABLE() is cleaned up and exposed as an
alternative to DEFINE_HASH_MAP() and DEFINE_HASH_SET(). This allows us
to get rid of the duplicated key where a hash map value already embeds
the key (the DWARF index file table) and gets rid of the need to make
a dummy hash set entry to do a search (the pointer and array type
caches).
Similar to "libdrgn: make memory reader pluggable with callbacks", we
want to support custom type indexes (imagine, e.g., using drgn to parse
a binary format). For now, this disables the dwarf index tests; we'll
have a better way to test them later, so let's not bother adding more
test scaffolding.
Older versions of Clang generate a call to __muloti4() for
__builtin_mul_overflow() with mixed signed and unsigned types. However,
Clang doesn't link to compiler-rt by default. Work around it by making
all of our calls to __builtin_mul_overflow() use unsigned types only.
1: https://bugs.llvm.org/show_bug.cgi?id=16404
The current mixed Python/C implementation works well, but it has a
couple of important limitations:
- It's too slow for some common use cases, like iterating over large
data structures.
- It can't be reused in utilities written in other languages.
This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:
- Types are now represented by a single Type class rather than the messy
polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
functions.
The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.
Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.