c_parse_specifier_qualifier_list() checks whether an identifier starts
with "size_t" or "ptrdiff_t" to decide whether to return the size_t or
ptrdiff_t type. This incorrectly matches stuff like like "size_tea" and
"ptrdiff_tee". Fix this by making it an exact comparison.
Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn is currently licensed as GPLv3+. Part of the long term vision for
drgn is that other projects can use it as a library providing
programmatic interfaces for debugger functionality. A more permissive
license is better suited to this goal. We decided on LGPLv2.1+ as a good
balance between software freedom and permissiveness.
All contributors not employed by Meta were contacted via email and
consented to the license change. The only exception was the author of
commit c4fbf7e589 ("libdrgn: fix for compilation error"), who did not
respond. That commit reverted a single line of code to one originally
written by me in commit 640b1c011d ("libdrgn: embed DWARF index in
DWARF info cache").
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Instead of string_builder_finalize(), which leaves the string_builder in
an undefined state (according to the documentation, at least), define
string_builder_null_terminate(), which documents exactly what it does.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Rather than documenting how to initialize a struct string_builder,
provide an initializer, STRING_BUILDER_INIT.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn_debug_info_find_complete() looks up the name of the incomplete type
in the global namespace. This is incorrect for C++: we need to look it
up in the namespace that the DIE is in.
To find the containing namespace, we need to do a DIE ancestor walk. We
don't want to do this for C, so add a flag indicating whether a language
has namespaces to struct drgn_language. If it's true, then we do the
ancestor walk and then look up the name in the appropriate namespace.
Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
Some functions that could be static found by -Wmissing-prototypes, some
include-what-you-use warnings, some missing SPDX identifiers. These
lints should be automated at some point.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
libdrgn currently exports struct drgn_language pointers from
drgn_program_language(), drgn_type_language(), and
drgn_object_language(), but doesn't provide any way to do anything with
them. Export our drgn_language instances and add drgn_language_name() so
that they can at least be compared and printed.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In the next change, we want to export languages to the public libdrgn
interface. I couldn't figure out any way to export array elements as
their own symbols. I'd also rather not export the drgn_languages array
indices as an enum because that would preclude ever having any sort of
language plugin support.
Instead, let's get rid of the drgn_languages array as it currently
exists and have separate drgn_language structures. This also allows us
to make a bunch of the C implementation functions static again. We keep
the language numbers so that we can store per-language data efficiently
(currently drgn_program::void_types and languages_py), as well as a
drgn_languages array to go from the language number to the struct
drgn_language. But, this is all internal and could be changed if we ever
support language plugins.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
GCC or binutils on Fedora Rawhide for ARM seems to have a bug where
c_keywords gets placed in the .data.rel.ro section (see
https://www.airs.com/blog/archives/189):
$ readelf -s .libs/libdrgnimpl_la-language_c.o | grep -w c_keywords
475: 00000000 16 OBJECT LOCAL DEFAULT 175 c_keywords
$ readelf -S .libs/libdrgnimpl_la-language_c.o | grep -F '[175]'
[175] .data.rel PROGBITS 00000000 051f90 000010 00 WA 0 0 4
$ readelf -s .libs/_drgn.so | grep -w c_keywords
9267: 0008e84c 16 OBJECT LOCAL DEFAULT 21 c_keywords.lto_priv.0
$ readelf -S .libs/_drgn.so | grep -F '[21]'
[21] .data.rel.ro PROGBITS 0008e018 07e018 000a10 00 WA 0 0 8
This results in a crash on startup when c_keywords_init() attempts to
populate c_keywords.
While this appears to be a compiler or linker bug, I've been meaning to
replace c_keywords with a static lookup function anyways. Now that we
have gen_strswitch.py, we can use it to generate the lookup function.
Add a script, gen_c_keywords_inc_strswitch.py, which generates an array
mapping token kind to spelling, and a memswitch mapping spelling to
token kind.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
A pointer, array, or function referring to an anonymous type currently
includes the full type definition in its type name. This creates very
badly formatted objects for, e.g., drgn's own hash table types. Instead,
use "struct <anonymous>" in the type name.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Also:
* Rename struct string to struct nstring and move it to its own header.
* Fix scripts/iwyu.py, which was broken by commit 5541fad063 ("Fix
some flake8 errors").
* Add workarounds for a few outstanding include-what-you-use issues.
There is still a false positive for
include-what-you-use/include-what-you-use#970, but hopefully that is
fixed soon.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
And use it in a few appropriate places. This should hopefully make it
harder to make iteration mistakes like the one fixed by commit
4755cfac7c ("libdrgn: dwarf_index: increment correct variable when
rolling back"). While we're doing this, move ARRAY_SIZE() into a new
header file with array_for_each() and make it lowercase.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Define that addresses for memory reads wrap around after the maximum
address rather than the current unpredictable behavior. This is done by:
1. Reworking drgn_memory_reader to work with an inclusive address range
so that a segment can contain UINT64_MAX. drgn_memory_reader remains
agnostic to the maximum address and requires that address ranges do
not overflow a uint64_t.
2. Adding the overflow/wrap-around logic to
drgn_program_add_memory_segment() and drgn_program_read_memory().
3. Changing direct uses of drgn_memory_reader_reader() to
drgn_program_read_memory() now that they are no longer equivalent.
(For some platforms, a fault might be more appropriate than wrapping
around, but this is a step in the right direction.)
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are a couple of places where we compute `NULL + 0`, which is
undefined behavior. Add a helper to do this safely.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Use drgn_not_found where it's more appropriate, and check explicitly
against drgn_stop instead of err->code == DRGN_ERROR_STOP.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add low-level getters equivalent to the drgn_program platform-related
helpers and use them in places where we have checked or can assume that
the platform is known.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In some places, we add __ preceding and following an attribute name, and
in some places, we don't. Let's make it consistent. We might as well opt
for the __ to make clashes with macros less likely.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:
- Byte order is only meaningful for scalars. What does it mean for a
struct to be big endian? A struct doesn't have a most or least
significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
byte order (DW_AT_endianity). The only producer I could find that uses
this is GCC for the scalar_storage_order type attribute, and it only
uses it for base types, not variables. GDB only seems to use to check
it for base types, as well.
So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.
The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Rename struct drgn_object_type to struct drgn_operand_type, add a new
struct drgn_object_type which contains all of the type-related fields
from struct drgn_object, and use it to implement drgn_object_type() and
drgn_object_type_operand(), which are replacements for
drgn_object_set_common() and drgn_object_type_encoding_and_size(). This
cleans up a lot of the boilerplate around initializing objects.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Getting the bit field size of a member will soon require evaluating the
lazy type, so return it from drgn_member_type() instead of accessing it
directly.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I was going to add an Object.available_ attribute, but that made me
realize that the naming is somewhat ambiguous, as a reference object
with an invalid address might also be considered "unavailable" by users.
Use the name "absent" instead, which is more clear: the object isn't
there at all.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We can get struct drgn_object down from 40 bytes to 32 bytes (on x86-64)
by moving the bit_offset and little_endian members out of the value and
reference structs.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are a couple of reasons that it was the wrong choice to have a
bit_offset for value objects:
1. When we store a buffer with a bit_offset, we're storing useless
padding bits.
2. bit_offset describes a location, or in other words, part of an
address. This makes sense for references, but not for values, which
are just a bag of bytes.
Get rid of union drgn_value.bit_offset in libdrgn, make
Object.bit_offset None for value objects, and disallow passing
bit_offset to the Object() constructor when creating a value. bit_offset
can still be passed when creating an object from a buffer, but we'll
shift the bytes down as necessary to store the value with no offset.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are some situations where we can find an object but can't
determine its value, like local variables that have been optimized out,
inlined functions without a concrete instance, and pure virtual methods.
It's still useful to get some information from these objects, namely
their types. Let's add the concept of an "unavailable" object, which is
an object with a known type but unknown value/address.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I'd like to use the name drgn_object_kind to distinguish between values
and references. "Encoding" is more accurate than "kind", anyways.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are several places where we'd like to enforce that every
enumeration is handled in a switch. Add SWITCH_ENUM() and
SWITCH_ENUM_DEFAULT() macros for that and use them.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Use *_hash_pair() for hash functions that do the full double hashing and
return a struct hash_pair and hash_*() for other hashing utility
functions. Also change some of the equality function names to be more
symmetric and improve the documentation.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
min() and max() from the Linux kernel go through the trouble of
resulting in a constant expression if the arguments are constant
expressions, but they can't be used outside of a function due to their
use of ({ }). This means that they can't be used for, e.g., enumerators
or global arrays. Let's simplify min() and max() and instead add
explicit min_iconst() and max_iconst() macros that can be used
everywhere that an integer constant expression is required. We can then
use it in hash_table.h. While we're here, let's split these into their
own header file and document them better.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).
This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.
Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.
So, let's reimagine types as being owned by a program. This has a few
parts:
1. In libdrgn, simple types are now created by factory functions,
drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
Program class.
5. While we're changing the API, the parameters to pointer_type() and
array_type() are reordered to be more logical (and to allow
pointer_type() to take a default size of None for the program's
default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
argument only.
A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.
This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
DRGN_UNREACHABLE() currently expands to abort(), but assert() provides
more information. If NDEBUG is defined, we can use
__builtin_unreachable() instead.
DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to
UNREACHABLE(). It's also not really related to errors, so this moves it
to internal.h.
compound_initializer_init_next() saves a pointer to the compound
initializer stack and uses it after appending to the stack, which may
have reallocated the stack.
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
This way, languages can be identified by an index, which will be useful
for adding Python bindings for drgn_language and for adding a language
field to drgn_type.
Decouple some of the responsibilities of FaultError to
OutOfBoundsError so consumers can differentiate between
invalid memory accesses and running out of bounds in
drgn Objects which may be based on valid memory address.