Commit Graph

47 Commits

Author SHA1 Message Date
Omar Sandoval
e49a87a3d7 libdrgn: remove struct drgn_object::prog
We can get it via the type now.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:21 -07:00
Omar Sandoval
a97f6c4fa2 Associate types with program
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.

Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.

So, let's reimagine types as being owned by a program. This has a few
parts:

1. In libdrgn, simple types are now created by factory functions,
   drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
   and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
   Program class.
5. While we're changing the API, the parameters to pointer_type() and
   array_type() are reordered to be more logical (and to allow
   pointer_type() to take a default size of None for the program's
   default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
   argument only.

A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:41:09 -07:00
Omar Sandoval
c31208f69c libdrgn: fold drgn_type_index into drgn_program
This is preparation for associating types with a program.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:36:35 -07:00
Omar Sandoval
948cda2941 libdrgn: add vector/hash table initializers and update coding style
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.

This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
2020-07-01 12:48:24 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
3d59e042f4 libdrgn: don't open-code fls()
c_integer_literal() has an open-coded equivalent of fls() that assumes
that unsigned long long is 64 bits. Use fls() instead.
2020-05-08 00:20:42 -07:00
Omar Sandoval
0a100064c1 libdrgn: improve and rename DRGN_UNREACHABLE()
DRGN_UNREACHABLE() currently expands to abort(), but assert() provides
more information. If NDEBUG is defined, we can use
__builtin_unreachable() instead.

DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to
UNREACHABLE(). It's also not really related to errors, so this moves it
to internal.h.
2020-05-07 15:16:22 -07:00
Omar Sandoval
a3248b51e3 libdrgn: fix use after free when formatting compound types
compound_initializer_init_next() saves a pointer to the compound
initializer stack and uses it after appending to the stack, which may
have reallocated the stack.
2020-04-13 16:49:18 -07:00
Omar Sandoval
cae7336750 libdrgn: fix error when expecting identifier after tag in type name
We should be looking at the kind of the previous token, not the kind of
the unexpected token.

Closes #52.
2020-03-13 11:07:46 -07:00
Jay Kamat
6c264b0eae libdrgn: add language to struct drgn_type
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
2020-02-26 19:55:42 -08:00
Omar Sandoval
9e2df9f217 libdrgn: put language definitions in one array
This way, languages can be identified by an index, which will be useful
for adding Python bindings for drgn_language and for adding a language
field to drgn_type.
2020-02-26 19:55:42 -08:00
Jay Kamat
054cb54a01 libdrgn: Rename find_symbol to find_symbol_by_address 2020-02-12 14:06:49 -08:00
Serapheim Dimitropoulos
ad82e9623a Introduce OutOfBoundsError
Decouple some of the responsibilities of FaultError to
OutOfBoundsError so consumers can differentiate between
invalid memory accesses and running out of bounds in
drgn Objects which may be based on valid memory address.
2020-02-04 14:59:31 -08:00
Omar Sandoval
1443d17fb4 libdrgn: add DRGN_FORMAT_OBJECT_IMPLICIT_ELEMENTS 2019-12-19 11:43:54 -08:00
Omar Sandoval
db66952b2e libdrgn: add DRGN_FORMAT_OBJECT_IMPLICIT_MEMBERS 2019-12-19 11:43:54 -08:00
Omar Sandoval
c8434e9a9e libdrgn: add DRGN_FORMAT_OBJECT_ELEMENT_INDICES 2019-12-19 11:43:54 -08:00
Omar Sandoval
cfceb491db libdrgn: add DRGN_FORMAT_OBJECT_MEMBER_NAMES 2019-12-19 11:43:54 -08:00
Omar Sandoval
4fad941ec1 libdrgn: add DRGN_FORMAT_OBJECT_{MEMBERS,ELEMENTS}_SAME_LINE 2019-12-19 11:43:54 -08:00
Omar Sandoval
c3f69ba559 libdrgn: use c_format_initializer for struct/union/class 2019-12-19 11:43:54 -08:00
Omar Sandoval
e2d2df4024 libdrgn: factor c_format_initializer out of c_format_array
This will also be used for compound types, and we're going to add a few
more options that we should handle in one place.
2019-12-19 11:43:54 -08:00
Omar Sandoval
6bb8da04a0 libdrgn: omit trailing comma when formatting one-line array
This is somewhat arbitrary, but I think it looks more natural to only
use the trailing comma for multi-line initializers.
2019-12-19 11:43:54 -08:00
Omar Sandoval
1411ba36a8 libdrgn: remove dead code in c_format_array_object
When we're checking whether the element that we formatted on one line
would fit on the previous line, we check whether the previous line is
empty with remaining_columns == start_columns. This is never true, as
remaining_columns is always set to start_columns - 1 at most, and it
only decreases from there until we start  a new line.
2019-12-19 11:43:54 -08:00
Omar Sandoval
7a3bf73df0 libdrgn: replace drgn_object_truthiness() with drgn_object_is_zero()
drgn_object_truthiness() is a misnomer, as truthiness is a
language-specific concept. Instead, invert the return value and rename
it to drgn_object_is_zero(), which more accurately conveys the meaning.
2019-12-19 11:43:54 -08:00
Omar Sandoval
d77b7bd7e3 libdrgn: add DRGN_FORMAT_OBJECT_{TYPE_NAME,MEMBER_TYPE_NAMES,ELEMENT_TYPE_NAMES} 2019-12-19 11:43:54 -08:00
Omar Sandoval
89307c532a libdrgn: add DRGN_FORMAT_OBJECT_CHAR 2019-12-19 11:43:54 -08:00
Omar Sandoval
7cee597fff libdrgn: add DRGN_FORMAT_OBJECT_STRING 2019-12-19 11:43:54 -08:00
Omar Sandoval
5865fa4d16 libdrgn: add DRGN_FORMAT_OBJECT_SYMBOLIZE 2019-12-19 11:43:54 -08:00
Omar Sandoval
f58bc4bf3a libdrgn: add DRGN_FORMAT_OBJECT_DEREFERENCE 2019-12-19 11:43:54 -08:00
Omar Sandoval
5fb02f03fd libdrgn: add flags to drgn_format_object() 2019-12-19 11:43:54 -08:00
Omar Sandoval
3b22bd3022 libdrgn: rename pretty_print -> format
In preparation for making drgn_pretty_print_object() more flexible
(i.e., not always "pretty"), rename it to drgn_format_object(). For
consistency, let's rename drgn_pretty_print_type_name(),
drgn_pretty_print_type(), and drgn_pretty_print_stack_trace(), too.
2019-12-16 11:21:12 -08:00
Omar Sandoval
f327552229 libdrgn: add strstartswith()
Instead of open coding this check all over the place, add a helper
function.
2019-12-12 13:26:50 -08:00
Amlan Nayak
0df2152307 Add basic class type support
This implements the first step at supporting C++: class types. In
particular, this adds a new drgn_type_kind, DRGN_TYPE_CLASS, and support
for parsing DW_TAG_class_type from DWARF. Although classes are not valid
in C, this adds support for pretty printing them, for completeness.
2019-11-18 10:36:40 -08:00
Omar Sandoval
97b5967c37 libdrgn: add a couple of helpers for working with buffer and reference objects
Expose drgn_object_buffer() and add drgn_buffer_object_size() and
drgn_reference_object_size().
2019-10-28 11:34:08 -07:00
Omar Sandoval
dcddaa2cc1 libdrgn: revamp hash table API
This makes several improvements to the hash table API.

The first two changes make things more general in order to be consistent
with the upcoming binary search tree API:

- Items are renamed to entries.
- Positions are renamed to iterators.
- hash_table_empty() is added.

One change makes the definition API more convenient:

- It is no longer necessary to pass the types into
  DEFINE_HASH_{MAP,SET}_FUNCTIONS().

A few changes take some good ideas from the C++ STL:

- hash_table_insert() now fails on duplicates instead of overwriting.
- hash_table_delete_iterator() returns the next iterator.
- hash_table_next() returns an iterator instead of modifying it.

One change reduces memory usage:

- The lower-level DEFINE_HASH_TABLE() is cleaned up and exposed as an
  alternative to DEFINE_HASH_MAP() and DEFINE_HASH_SET(). This allows us
  to get rid of the duplicated key where a hash map value already embeds
  the key (the DWARF index file table) and gets rid of the need to make
  a dummy hash set entry to do a search (the pointer and array type
  caches).
2019-05-24 17:48:05 -07:00
Omar Sandoval
f08d4c9a08 libdrgn: make string_builder API return bool
It can only fail with no memory, so simplify it.
2019-05-14 10:07:50 -07:00
Omar Sandoval
baba1ff3f0 libdrgn: make program components pluggable
Currently, programs can be created for three main use-cases: core dumps,
the running kernel, and a running process. However, internally, the
program memory, types, and symbols are pluggable. Expose that as a
callback API, which makes it possible to use drgn in much more creative
ways.
2019-05-10 12:41:07 -07:00
Omar Sandoval
6d7b0631b9 libdrgn: simplify string_builder API
Instead of maintaining a null-terminated string, null-terminate just
before returning the string as a "finalize" step.
2019-05-09 15:25:58 -07:00
Omar Sandoval
5200a6652c libdrgn: embed memory reader, type index, and symbol index in program 2019-05-06 14:55:34 -07:00
Omar Sandoval
a98445c277 libdrgn: make type index pluggable with callbacks
Similar to "libdrgn: make memory reader pluggable with callbacks", we
want to support custom type indexes (imagine, e.g., using drgn to parse
a binary format). For now, this disables the dwarf index tests; we'll
have a better way to test them later, so let's not bother adding more
test scaffolding.
2019-05-06 14:55:34 -07:00
Omar Sandoval
043cddf6d8 libdrgn: move member cache to type index
It makes more sense here than in struct drgn_program.
2019-05-06 14:55:34 -07:00
Omar Sandoval
3645ce78ea libdrgn: assume all pointers have same size in type index 2019-05-06 14:55:34 -07:00
Omar Sandoval
06960f591c libdrgn: look up primitive types on demand
Instead of caching all primitive types ahead of time, look them up on
demand. This is preparation for making the type index API more flexible.
2019-05-06 14:55:34 -07:00
Omar Sandoval
932b7857b5 libdrgn: expose primitive type concept to public interface
Previously known as c_type.
2019-05-06 14:55:34 -07:00
Omar Sandoval
97f5cf70c6 libdrgn: fix C array and function casting
Casting an array or function should first convert the array or function
into a pointer.
2019-04-12 16:40:12 -07:00
Omar Sandoval
1db8d11f84 libdrgn: allow void pointer arithmetic
This is a GCC extension, but it's used pretty often in practice.
2019-04-12 16:06:24 -07:00
Omar Sandoval
309dc82789 libdrgn: allow comparing any pointer types in C
There's a bug that we don't allow comparisons between void * and other
pointer types, so let's fix it by allowing all pointer comparisons
regardless of the referenced type. Although this isn't valid by the C
standard, GCC and Clang both allow it by default (with a warning).
2019-04-12 15:44:08 -07:00
Omar Sandoval
75c3679147 Rewrite drgn core in C
The current mixed Python/C implementation works well, but it has a
couple of important limitations:

- It's too slow for some common use cases, like iterating over large
  data structures.
- It can't be reused in utilities written in other languages.

This replaces the internals with a new library written in C, libdrgn. It
includes Python bindings with mostly the same public interface as
before, with some important improvements:

- Types are now represented by a single Type class rather than the messy
  polymorphism in the Python implementation.
- Qualifiers are a bitmask instead of a set of strings.
- Bit fields are not considered a separate type.
- The lvalue/rvalue terminology is replaced with reference/value.
- Structure, union, and array values are better supported.
- Function objects are supported.
- Program distinguishes between lookups of variables, constants, and
  functions.

The C rewrite is about 6x as fast as the original Python when using the
Python bindings, and about 8x when using the C API directly.

Currently, the exposed API in C is fairly conservative. In the future,
the memory reader, type index, and object index APIs will probably be
exposed for more flexibility.
2019-04-02 14:12:07 -07:00