Commit Graph

84 Commits

Author SHA1 Message Date
Omar Sandoval
0fad8a591a libdrgn: fix finding types beginning in size_t or ptrdiff_t
c_parse_specifier_qualifier_list() checks whether an identifier starts
with "size_t" or "ptrdiff_t" to decide whether to return the size_t or
ptrdiff_t type. This incorrectly matches stuff like like "size_tea" and
"ptrdiff_tee". Fix this by making it an exact comparison.

Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-28 16:21:56 -08:00
Omar Sandoval
87b7292aa5 Relicense drgn from GPLv3+ to LGPLv2.1+
drgn is currently licensed as GPLv3+. Part of the long term vision for
drgn is that other projects can use it as a library providing
programmatic interfaces for debugger functionality. A more permissive
license is better suited to this goal. We decided on LGPLv2.1+ as a good
balance between software freedom and permissiveness.

All contributors not employed by Meta were contacted via email and
consented to the license change. The only exception was the author of
commit c4fbf7e589 ("libdrgn: fix for compilation error"), who did not
respond. That commit reverted a single line of code to one originally
written by me in commit 640b1c011d ("libdrgn: embed DWARF index in
DWARF info cache").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-01 17:05:16 -07:00
Omar Sandoval
03d5c2ebac libdrgn: string_builder: replace string_builder_finalize()
Instead of string_builder_finalize(), which leaves the string_builder in
an undefined state (according to the documentation, at least), define
string_builder_null_terminate(), which documents exactly what it does.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-10-05 15:55:04 -07:00
Omar Sandoval
d76a3a338f libdrgn: string_builder: add dedicated initializer
Rather than documenting how to initialize a struct string_builder,
provide an initializer, STRING_BUILDER_INIT.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-10-05 15:32:07 -07:00
Jay Kamat
063850325f libdrgn: dwarf: look up complete types in namespaces
drgn_debug_info_find_complete() looks up the name of the incomplete type
in the global namespace. This is incorrect for C++: we need to look it
up in the namespace that the DIE is in.

To find the containing namespace, we need to do a DIE ancestor walk. We
don't want to do this for C, so add a flag indicating whether a language
has namespaces to struct drgn_language. If it's true, then we do the
ancestor walk and then look up the name in the appropriate namespace.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2022-07-15 16:02:56 -07:00
prozak
e8dada0ec1 Enable prog.type to work with classes
Also added test for CPP class type

This is a prerequisite to #83

Signed-off-by: mykolal <nickolay.lysenko@gmail.com>
2022-02-22 14:55:23 -08:00
Omar Sandoval
4f5249775d Fix various lints
Some functions that could be static found by -Wmissing-prototypes, some
include-what-you-use warnings, some missing SPDX identifiers. These
lints should be automated at some point.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-02-17 10:45:42 -08:00
Omar Sandoval
9397a11605 libdrgn: export drgn_language instances
libdrgn currently exports struct drgn_language pointers from
drgn_program_language(), drgn_type_language(), and
drgn_object_language(), but doesn't provide any way to do anything with
them. Export our drgn_language instances and add drgn_language_name() so
that they can at least be compared and printed.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-02-16 13:07:42 -08:00
Omar Sandoval
5d65ebb04b libdrgn: don't store language structures in one array
In the next change, we want to export languages to the public libdrgn
interface. I couldn't figure out any way to export array elements as
their own symbols. I'd also rather not export the drgn_languages array
indices as an enum because that would preclude ever having any sort of
language plugin support.

Instead, let's get rid of the drgn_languages array as it currently
exists and have separate drgn_language structures. This also allows us
to make a bunch of the C implementation functions static again. We keep
the language numbers so that we can store per-language data efficiently
(currently drgn_program::void_types and languages_py), as well as a
drgn_languages array to go from the language number to the struct
drgn_language. But, this is all internal and could be changed if we ever
support language plugins.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-02-16 12:47:12 -08:00
Omar Sandoval
c49bba41b2 libdrgn: language_c: replace c_keywords with memswitch
GCC or binutils on Fedora Rawhide for ARM seems to have a bug where
c_keywords gets placed in the .data.rel.ro section (see
https://www.airs.com/blog/archives/189):

$ readelf -s .libs/libdrgnimpl_la-language_c.o | grep -w c_keywords
   475: 00000000    16 OBJECT  LOCAL  DEFAULT  175 c_keywords
$ readelf -S .libs/libdrgnimpl_la-language_c.o | grep -F '[175]'
  [175] .data.rel         PROGBITS        00000000 051f90 000010 00  WA  0   0  4
$ readelf -s .libs/_drgn.so | grep -w c_keywords
  9267: 0008e84c    16 OBJECT  LOCAL  DEFAULT   21 c_keywords.lto_priv.0
$ readelf -S .libs/_drgn.so | grep -F '[21]'
  [21] .data.rel.ro      PROGBITS        0008e018 07e018 000a10 00  WA  0   0  8

This results in a crash on startup when c_keywords_init() attempts to
populate c_keywords.

While this appears to be a compiler or linker bug, I've been meaning to
replace c_keywords with a static lookup function anyways. Now that we
have gen_strswitch.py, we can use it to generate the lookup function.
Add a script, gen_c_keywords_inc_strswitch.py, which generates an array
mapping token kind to spelling, and a memswitch mapping spelling to
token kind.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-02-04 20:26:35 -08:00
Omar Sandoval
8a41adc1b0 libdrgn: language_c: add missing error check in c_parse_abstract_declarator()
Found with clang-static-analyzer.

Reported-by: Kevin Svetlitski <svetlitski@fb.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-12-08 13:56:15 -08:00
Omar Sandoval
3914bb8e29 libdrgn: fix type names referring to anonymous types
A pointer, array, or function referring to an anonymous type currently
includes the full type definition in its type name. This creates very
badly formatted objects for, e.g., drgn's own hash table types. Instead,
use "struct <anonymous>" in the type name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-23 00:57:42 -08:00
Omar Sandoval
c0d8709b45 Update copyright headers to Meta
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-21 15:59:44 -08:00
Omar Sandoval
d1745755f1 Fix some include-what-you-use warnings
Also:

* Rename struct string to struct nstring and move it to its own header.
* Fix scripts/iwyu.py, which was broken by commit 5541fad063 ("Fix
  some flake8 errors").
* Add workarounds for a few outstanding include-what-you-use issues.

There is still a false positive for
include-what-you-use/include-what-you-use#970, but hopefully that is
fixed soon.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-10 15:09:29 -08:00
Omar Sandoval
fba5947fec libdrgn: add array_for_each()
And use it in a few appropriate places. This should hopefully make it
harder to make iteration mistakes like the one fixed by commit
4755cfac7c ("libdrgn: dwarf_index: increment correct variable when
rolling back"). While we're doing this, move ARRAY_SIZE() into a new
header file with array_for_each() and make it lowercase.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-08-23 17:32:00 -07:00
Omar Sandoval
0e3054a0ba libdrgn: make addresses wrap around when reading memory
Define that addresses for memory reads wrap around after the maximum
address rather than the current unpredictable behavior. This is done by:

1. Reworking drgn_memory_reader to work with an inclusive address range
   so that a segment can contain UINT64_MAX. drgn_memory_reader remains
   agnostic to the maximum address and requires that address ranges do
   not overflow a uint64_t.
2. Adding the overflow/wrap-around logic to
   drgn_program_add_memory_segment() and drgn_program_read_memory().
3. Changing direct uses of drgn_memory_reader_reader() to
   drgn_program_read_memory() now that they are no longer equivalent.

(For some platforms, a fault might be more appropriate than wrapping
around, but this is a step in the right direction.)

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-06-03 17:49:29 -07:00
Omar Sandoval
a4b9d68a8c Use GPL-3.0-or-later license identifier instead of GPL-3.0+
Apparently the latter is deprecated and the former is preferred.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-03 01:10:35 -07:00
Omar Sandoval
301cc3f139 libdrgn: fix UBSan "applying zero offset to null pointer" errors
There are a couple of places where we compute `NULL + 0`, which is
undefined behavior. Add a helper to do this safely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-04-02 13:38:29 -07:00
Omar Sandoval
a24c0f5b33 libdrgn: clean up usage of drgn_stop
Use drgn_not_found where it's more appropriate, and check explicitly
against drgn_stop instead of err->code == DRGN_ERROR_STOP.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-03-05 12:46:06 -08:00
Omar Sandoval
25eb2abb1a libdrgn: add drgn_platform getters
Add low-level getters equivalent to the drgn_program platform-related
helpers and use them in places where we have checked or can assume that
the platform is known.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-26 16:05:49 -08:00
Omar Sandoval
aaa98ccae3 libdrgn: consistently use __ for __attribute__ names
In some places, we add __ preceding and following an attribute name, and
in some places, we don't. Let's make it consistent. We might as well opt
for the __ to make clashes with macros less likely.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-21 03:16:23 -08:00
Omar Sandoval
9fda010789 Track byte order in scalar types instead of objects
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:

- Byte order is only meaningful for scalars. What does it mean for a
  struct to be big endian? A struct doesn't have a most or least
  significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
  byte order (DW_AT_endianity). The only producer I could find that uses
  this is GCC for the scalar_storage_order type attribute, and it only
  uses it for base types, not variables. GDB only seems to use to check
  it for base types, as well.

So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.

The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 21:41:29 -08:00
Omar Sandoval
72b4aa9669 libdrgn: clean up object initialization
Rename struct drgn_object_type to struct drgn_operand_type, add a new
struct drgn_object_type which contains all of the type-related fields
from struct drgn_object, and use it to implement drgn_object_type() and
drgn_object_type_operand(), which are replacements for
drgn_object_set_common() and drgn_object_type_encoding_and_size(). This
cleans up a lot of the boilerplate around initializing objects.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-19 17:43:14 -08:00
Omar Sandoval
78316a28fb libdrgn: remove half-baked support for complex types
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-02-17 14:56:33 -08:00
Omar Sandoval
190062f470 libdrgn: get drgn_type_member.bit_field_size through drgn_member_type()
Getting the bit field size of a member will soon require evaluating the
lazy type, so return it from drgn_member_type() instead of accessing it
directly.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-01-08 17:39:51 -08:00
Omar Sandoval
30cfa40a72 libdrgn: rename "unavailable" objects to "absent" objects
I was going to add an Object.available_ attribute, but that made me
realize that the naming is somewhat ambiguous, as a reference object
with an invalid address might also be considered "unavailable" by users.
Use the name "absent" instead, which is more clear: the object isn't
there at all.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-29 14:58:26 -08:00
Omar Sandoval
e72ecd0e2c libdrgn: replace drgn_program_member_info() with drgn_type_find_member()
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-15 14:40:54 -08:00
Omar Sandoval
738ae2c75f libdrgn: pack struct drgn_object better
We can get struct drgn_object down from 40 bytes to 32 bytes (on x86-64)
by moving the bit_offset and little_endian members out of the value and
reference structs.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-14 12:29:17 -08:00
Omar Sandoval
abafdd965f Remove bit_offset from value objects
There are a couple of reasons that it was the wrong choice to have a
bit_offset for value objects:

1. When we store a buffer with a bit_offset, we're storing useless
   padding bits.
2. bit_offset describes a location, or in other words, part of an
   address. This makes sense for references, but not for values, which
   are just a bag of bytes.

Get rid of union drgn_value.bit_offset in libdrgn, make
Object.bit_offset None for value objects, and disallow passing
bit_offset to the Object() constructor when creating a value. bit_offset
can still be passed when creating an object from a buffer, but we'll
shift the bytes down as necessary to store the value with no offset.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-14 12:29:17 -08:00
Omar Sandoval
6bd0c2b4d2 libdrgn: add concept of "unavailable" objects
There are some situations where we can find an object but can't
determine its value, like local variables that have been optimized out,
inlined functions without a concrete instance, and pure virtual methods.
It's still useful to get some information from these objects, namely
their types. Let's add the concept of an "unavailable" object, which is
an object with a known type but unknown value/address.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 13:58:19 -08:00
Omar Sandoval
5f17281926 libdrgn: make drgn_object::is_reference an enum
To prepare for a new kind of object, replace the is_reference bool with
an enum drgn_object_kind.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 13:37:58 -08:00
Omar Sandoval
edb1fe7f2f libdrgn: rename drgn_object_kind to drgn_object_encoding
I'd like to use the name drgn_object_kind to distinguish between values
and references. "Encoding" is more accurate than "kind", anyways.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 12:02:26 -08:00
Omar Sandoval
2710b4d2aa libdrgn: add macros for strict enum switch statements
There are several places where we'd like to enforce that every
enumeration is handled in a switch. Add SWITCH_ENUM() and
SWITCH_ENUM_DEFAULT() macros for that and use them.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-12-04 12:02:23 -08:00
Omar Sandoval
3c5d22637e libdrgn: clean up hash function APIs and improve documentation
Use *_hash_pair() for hash functions that do the full double hashing and
return a struct hash_pair and hash_*() for other hashing utility
functions. Also change some of the equality function names to be more
symmetric and improve the documentation.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-12 16:20:08 -07:00
Omar Sandoval
761da83ddd libdrgn: add {min,max}_iconst() and rewrite min() and max()
min() and max() from the Linux kernel go through the trouble of
resulting in a constant expression if the arguments are constant
expressions, but they can't be used outside of a function due to their
use of ({ }). This means that they can't be used for, e.g., enumerators
or global arrays. Let's simplify min() and max() and instead add
explicit min_iconst() and max_iconst() macros that can be used
everywhere that an integer constant expression is required. We can then
use it in hash_table.h. While we're here, let's split these into their
own header file and document them better.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-10 23:48:03 -07:00
Omar Sandoval
fa44171ba1 libdrgn: split bit operations into their own header
And improve their documentation.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-09 17:44:15 -07:00
Omar Sandoval
286c09844e Clean up #includes with include-what-you-use
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).

This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-23 16:29:42 -07:00
Omar Sandoval
e49a87a3d7 libdrgn: remove struct drgn_object::prog
We can get it via the type now.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:21 -07:00
Omar Sandoval
a97f6c4fa2 Associate types with program
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.

Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.

So, let's reimagine types as being owned by a program. This has a few
parts:

1. In libdrgn, simple types are now created by factory functions,
   drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
   and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
   Program class.
5. While we're changing the API, the parameters to pointer_type() and
   array_type() are reordered to be more logical (and to allow
   pointer_type() to take a default size of None for the program's
   default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
   argument only.

A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:41:09 -07:00
Omar Sandoval
c31208f69c libdrgn: fold drgn_type_index into drgn_program
This is preparation for associating types with a program.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:36:35 -07:00
Omar Sandoval
948cda2941 libdrgn: add vector/hash table initializers and update coding style
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.

This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
2020-07-01 12:48:24 -07:00
Omar Sandoval
8b264f8823 Update copyright headers to Facebook and add missing headers
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
2020-05-15 15:13:02 -07:00
Omar Sandoval
3d59e042f4 libdrgn: don't open-code fls()
c_integer_literal() has an open-coded equivalent of fls() that assumes
that unsigned long long is 64 bits. Use fls() instead.
2020-05-08 00:20:42 -07:00
Omar Sandoval
0a100064c1 libdrgn: improve and rename DRGN_UNREACHABLE()
DRGN_UNREACHABLE() currently expands to abort(), but assert() provides
more information. If NDEBUG is defined, we can use
__builtin_unreachable() instead.

DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to
UNREACHABLE(). It's also not really related to errors, so this moves it
to internal.h.
2020-05-07 15:16:22 -07:00
Omar Sandoval
a3248b51e3 libdrgn: fix use after free when formatting compound types
compound_initializer_init_next() saves a pointer to the compound
initializer stack and uses it after appending to the stack, which may
have reallocated the stack.
2020-04-13 16:49:18 -07:00
Omar Sandoval
cae7336750 libdrgn: fix error when expecting identifier after tag in type name
We should be looking at the kind of the previous token, not the kind of
the unexpected token.

Closes #52.
2020-03-13 11:07:46 -07:00
Jay Kamat
6c264b0eae libdrgn: add language to struct drgn_type
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
2020-02-26 19:55:42 -08:00
Omar Sandoval
9e2df9f217 libdrgn: put language definitions in one array
This way, languages can be identified by an index, which will be useful
for adding Python bindings for drgn_language and for adding a language
field to drgn_type.
2020-02-26 19:55:42 -08:00
Jay Kamat
054cb54a01 libdrgn: Rename find_symbol to find_symbol_by_address 2020-02-12 14:06:49 -08:00
Serapheim Dimitropoulos
ad82e9623a Introduce OutOfBoundsError
Decouple some of the responsibilities of FaultError to
OutOfBoundsError so consumers can differentiate between
invalid memory accesses and running out of bounds in
drgn Objects which may be based on valid memory address.
2020-02-04 14:59:31 -08:00