Commit Graph

40 Commits

Author SHA1 Message Date
Omar Sandoval
c0d8709b45 Update copyright headers to Meta
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-21 15:59:44 -08:00
Omar Sandoval
64c4afa298 libdrgn: type: fix hash table insertion error check
table_insert_searched() returns -1 when insertion fails.

Fixes: a97f6c4fa2 ("Associate types with program")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-17 18:04:57 -08:00
Omar Sandoval
1339dc6a2f libdrgn: hash_table: move entry_to_key to DEFINE_HASH_TABLE_FUNCTIONS()
DEFINE_HASH_TABLE_TYPE() doesn't actually need to know the key type.
Move that argument (and some of the derived constants) to
DEFINE_HASH_TABLE_FUNCTIONS(). This will allow recursive hash table
types. As a nice side effect, it also reduces the size of common header
files.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-10-23 00:52:23 -07:00
Omar Sandoval
fba5947fec libdrgn: add array_for_each()
And use it in a few appropriate places. This should hopefully make it
harder to make iteration mistakes like the one fixed by commit
4755cfac7c ("libdrgn: dwarf_index: increment correct variable when
rolling back"). While we're doing this, move ARRAY_SIZE() into a new
header file with array_for_each() and make it lowercase.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-08-23 17:32:00 -07:00
Omar Sandoval
a4b9d68a8c Use GPL-3.0-or-later license identifier instead of GPL-3.0+
Apparently the latter is deprecated and the former is preferred.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:10:35 -07:00
Omar Sandoval
c9dc7fd574 libdrgn: type: fix memcpy() undefined behavior
It's undefined behavior to pass NULL to memcpy() even if the length is
zero. See also commit a17215e984 ("libdrgn: dwarf_index: fix memcpy()
undefined behavior").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-02 13:38:13 -07:00
Omar Sandoval
b5ed892481 Fix some include-what-you-use warnings and update for Bear 3
Bear 3 changed the CLI arguments, so update scripts/iwyu.py for it and
clean up some new warnings.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-26 16:05:49 -08:00
Omar Sandoval
25eb2abb1a libdrgn: add drgn_platform getters
Add low-level getters equivalent to the drgn_program platform-related
helpers and use them in places where we have checked or can assume that
the platform is known.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-26 16:05:49 -08:00
Omar Sandoval
9fda010789 Track byte order in scalar types instead of objects
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:

- Byte order is only meaningful for scalars. What does it mean for a
  struct to be big endian? A struct doesn't have a most or least
  significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
  byte order (DW_AT_endianity). The only producer I could find that uses
  this is GCC for the scalar_storage_order type attribute, and it only
  uses it for base types, not variables. GDB only seems to use to check
  it for base types, as well.

So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.

The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 21:41:29 -08:00
Omar Sandoval
78316a28fb libdrgn: remove half-baked support for complex types
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-17 14:56:33 -08:00
Omar Sandoval
352c31e1ac Add support for C++ template parameters
Add struct drgn_type_template_parameter to libdrgn, the corresponding
TypeTemplateParameter to the Python bindings, and support for parsing
them from DWARF.

With this, support for templates is almost, but not quite, complete. The
main wart is that DW_TAG_name of compound types includes the template
parameters, so the type tag includes it as well. We should remove that
from the tag and instead have the type formatting code add it only when
getting the full type name.

Based on a patch from Jay Kamat.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
d35243b354 libdrgn: replace lazy types with lazy objects
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
190062f470 libdrgn: get drgn_type_member.bit_field_size through drgn_member_type()
Getting the bit field size of a member will soon require evaluating the
lazy type, so return it from drgn_member_type() instead of accessing it
directly.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
d6a840ec30 libdrgn: deinitialize empty members/parameters/enumerators when deduplicating
Right now, an empty builder vector will not have anything to free, but
if we start pre-reserving these later, it will be a leak.

Fixes: c7af566c6e ("libdrgn: deduplicate all types with no members/parameters/enumerators")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-06 17:47:25 -08:00
Omar Sandoval
c7af566c6e libdrgn: deduplicate all types with no members/parameters/enumerators
Even if a compound, function, or enumerated type is complete, we can
still deduplicate it as long as it doesn't have members, parameters, or
enumerators.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-06 01:59:48 -08:00
Omar Sandoval
7d7aa7bf7b libdrgn/python: remove Type == operator
The == operator on drgn.Type is only intended for testing. It's
expensive and slow and not what people usually want. It's going to get
even more awkward to define once types can refer to objects (for
template parameters and static members and such). Let's replace == with
a new identical() function only available in unit tests. Then, remove
the operator from the Python bindings as well as the underlying libdrgn
drgn_type_eq() and drgn_qualified_type_eq() functions.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-22 03:11:38 -08:00
Omar Sandoval
40004e5c8f libdrgn/python: add offsetof()
offsetof() can almost be implemented with Type.member(name).offset, but
that doesn't parse member designators. Add an offsetof() function that
does (and add drgn_type_offsetof() in libdrgn).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:46:41 -08:00
Omar Sandoval
a595e52d22 libdrgn/python: add Type.has_member()
Add drgn_type_has_member() to libdrgn and Type.has_member() to the
Python bindings. This can simplify some version checks, like the one in
_for_each_block_device() since commit 9a10a927b0 ("helpers: fix
for_each_{disk,partition}() on kernels >= v5.1").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 16:38:48 -08:00
Omar Sandoval
e72ecd0e2c libdrgn: replace drgn_program_member_info() with drgn_type_find_member()
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 14:40:54 -08:00
Omar Sandoval
edb1fe7f2f libdrgn: rename drgn_object_kind to drgn_object_encoding
I'd like to use the name drgn_object_kind to distinguish between values
and references. "Encoding" is more accurate than "kind", anyways.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 12:02:26 -08:00
Omar Sandoval
2710b4d2aa libdrgn: add macros for strict enum switch statements
There are several places where we'd like to enforce that every
enumeration is handled in a switch. Add SWITCH_ENUM() and
SWITCH_ENUM_DEFAULT() macros for that and use them.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 12:02:23 -08:00
Omar Sandoval
3c5d22637e libdrgn: clean up hash function APIs and improve documentation
Use *_hash_pair() for hash functions that do the full double hashing and
return a struct hash_pair and hash_*() for other hashing utility
functions. Also change some of the equality function names to be more
symmetric and improve the documentation.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-12 16:20:08 -07:00
Omar Sandoval
4cbb9b552a libdrgn: fix comparison of types with anonymous members
drgn_type_members_eq() skips comparing the types of anonymous members.
Fix that and add a test for it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-08 17:32:46 -07:00
Omar Sandoval
286c09844e Clean up #includes with include-what-you-use
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).

This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-23 16:29:42 -07:00
Omar Sandoval
a97f6c4fa2 Associate types with program
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.

Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.

So, let's reimagine types as being owned by a program. This has a few
parts:

1. In libdrgn, simple types are now created by factory functions,
   drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
   and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
   Program class.
5. While we're changing the API, the parameters to pointer_type() and
   array_type() are reordered to be more logical (and to allow
   pointer_type() to take a default size of None for the program's
   default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
   argument only.

A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:41:09 -07:00
Omar Sandoval
c31208f69c libdrgn: fold drgn_type_index into drgn_program
This is preparation for associating types with a program.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:36:35 -07:00
Omar Sandoval
3028da4d1d libdrgn: compare language in drgn_type_eq()
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-07-08 22:07:49 -07:00
Omar Sandoval
948cda2941 libdrgn: add vector/hash table initializers and update coding style
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.

This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
2020-07-01 12:48:24 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
0a100064c1 libdrgn: improve and rename DRGN_UNREACHABLE()
DRGN_UNREACHABLE() currently expands to abort(), but assert() provides
more information. If NDEBUG is defined, we can use
__builtin_unreachable() instead.

DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to
UNREACHABLE(). It's also not really related to errors, so this moves it
to internal.h.
2020-05-07 15:16:22 -07:00
Jay Kamat
ecef9d74ef libdrgn: get rid of arrays embedded in drgn_type
For C++ support, we need to add an array of template parameters to
struct drgn_type. struct drgn_type already has arrays for members,
enumerators, and parameters embedded at the end of the structure,
because no type needs more than one of those. However, struct, union,
and class types may need members and template parameters. We could add a
separate array of templates, but then it gets confusing having two
methods of storing arrays in struct drgn_type. Let's make these arrays
separate instead of embedding them.
2020-04-13 16:47:05 -07:00
Jay Kamat
6c264b0eae libdrgn: add language to struct drgn_type
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
2020-02-26 19:55:42 -08:00
Omar Sandoval
3b22bd3022 libdrgn: rename pretty_print -> format
In preparation for making drgn_pretty_print_object() more flexible
(i.e., not always "pretty"), rename it to drgn_format_object(). For
consistency, let's rename drgn_pretty_print_type_name(),
drgn_pretty_print_type(), and drgn_pretty_print_stack_trace(), too.
2019-12-16 11:21:12 -08:00
Omar Sandoval
dd59e5431c libdrgn: fix extremely slow type comparison
Matt Ahrens reported that comparing two types would sometimes end up in
a seemingly infinite loop, which he discovered was because we repeat
comparisons of types as long as they're not in a cycle. Fix it by
caching all comparisons during a call.
2019-11-24 09:46:00 -08:00
Amlan Nayak
0df2152307 Add basic class type support
This implements the first step at supporting C++: class types. In
particular, this adds a new drgn_type_kind, DRGN_TYPE_CLASS, and support
for parsing DW_TAG_class_type from DWARF. Although classes are not valid
in C, this adds support for pretty printing them, for completeness.
2019-11-18 10:36:40 -08:00
Omar Sandoval
dcddaa2cc1 libdrgn: revamp hash table API
This makes several improvements to the hash table API.

The first two changes make things more general in order to be consistent
with the upcoming binary search tree API:

- Items are renamed to entries.
- Positions are renamed to iterators.
- hash_table_empty() is added.

One change makes the definition API more convenient:

- It is no longer necessary to pass the types into
  DEFINE_HASH_{MAP,SET}_FUNCTIONS().

A few changes take some good ideas from the C++ STL:

- hash_table_insert() now fails on duplicates instead of overwriting.
- hash_table_delete_iterator() returns the next iterator.
- hash_table_next() returns an iterator instead of modifying it.

One change reduces memory usage:

- The lower-level DEFINE_HASH_TABLE() is cleaned up and exposed as an
  alternative to DEFINE_HASH_MAP() and DEFINE_HASH_SET(). This allows us
  to get rid of the duplicated key where a hash map value already embeds
  the key (the DWARF index file table) and gets rid of the need to make
  a dummy hash set entry to do a search (the pointer and array type
  caches).
2019-05-24 17:48:05 -07:00
Omar Sandoval
a98445c277 libdrgn: make type index pluggable with callbacks
Similar to "libdrgn: make memory reader pluggable with callbacks", we
want to support custom type indexes (imagine, e.g., using drgn to parse
a binary format). For now, this disables the dwarf index tests; we'll
have a better way to test them later, so let's not bother adding more
test scaffolding.
2019-05-06 14:55:34 -07:00
Omar Sandoval
932b7857b5 libdrgn: expose primitive type concept to public interface
Previously known as c_type.
2019-05-06 14:55:34 -07:00
Omar Sandoval
2dd14ad522 libdrgn: work around "undefined reference to '__muloti4'" when using Clang
Older versions of Clang generate a call to __muloti4() for
__builtin_mul_overflow() with mixed signed and unsigned types. However,
Clang doesn't link to compiler-rt by default. Work around it by making
all of our calls to __builtin_mul_overflow() use unsigned types only.

1: https://bugs.llvm.org/show_bug.cgi?id=16404
2019-04-02 14:12:11 -07:00
Omar Sandoval
75c3679147 Rewrite drgn core in C
The current mixed Python/C implementation works well, but it has a
couple of important limitations:

- It's too slow for some common use cases, like iterating over large
  data structures.
- It can't be reused in utilities written in other languages.

This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:

- Types are now represented by a single Type class rather than the messy
  polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
  functions.

The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.

Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.
2019-04-02 14:12:07 -07:00