This will be used for partial 128-bit object support. There are other
places that should probably be converted to use it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The 32-bit and 64-bit variants have different register sizes, so they're
different architectures in drgn. For now, put them in the same file so
that they can share the relocation implementation. We'll need to figure
out how to handle registers later.
P.S. RISC-V has the weirdest relocations so far. /proc/kcore also
appears to be broken.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The only relocation type I saw in Debian's kernel module debug info was
R_ARM_ABS32. R_ARM_REL32 is easy. The Linux kernel supports a bunch of
other ones that don't seem relevant to debug info.
Unfortunately, I wasn't able to test this because /proc/kcore doesn't
exist on Arm. This apparently goes all the way back to 2003:
https://lwn.net/Articles/45315/.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The only relocation types I saw in Debian's kernel module debug info
were R_AARCH64_ABS64 and R_AARCH64_ABS32. R_AARCH64_ABS16,
R_AARCH64_PREL64, R_AARCH64_PREL32, and R_AARCH64_PREL16 are all easy.
The remaining types supported by the Linux kernel are for movw and
immediate instructions, which aren't relevant to debug info.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The only relocation type I saw in Debian's kernel module debug info was
R_386_32. R_386_PC32 is easy. The Linux kernel also supports
R_386_PLT32, but that's the same story as R_X86_64_PLT32 in x86-64, so
we don't implement it for now.
I was torn between naming it i386, x86, or IA-32. x86 isn't immediately
clear whether x86-64 is included or not. No one other than Intel calls
it IA-32. i386 might incorrectly imply that it is strictly the original
i386 instruction set with no later extensions, but the more general
meaning is used frequently in the Linux world (e.g., Debian and QEMU
both call it i386), so I went with that in the end.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Our cheap heuristic for the default language will not always be correct,
and although we can improve it as cases arise, we should also just have
a way for the user to explicitly set the default language. Add
drgn_program_set_language() to libdrgn and allow setting
drgn.Program.language in the Python bindings. This will also make unit
testing different languages easier.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
libdrgn currently exports struct drgn_language pointers from
drgn_program_language(), drgn_type_language(), and
drgn_object_language(), but doesn't provide any way to do anything with
them. Export our drgn_language instances and add drgn_language_name() so
that they can at least be compared and printed.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If a TID does not exist, then linux_helper_find_task() succeeds but
returns a null pointer object. Check for that instead of returning a
bogus thread.
Fixes: 301cc767ba ("Implement a new API for representing threads")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently only supported for user-space crash dumps. E.g. no support for
live user-space application debugging or kernel debugging.
Closes#144.
Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Currently we can lookup symbols by name or address, but this will only
return one symbol, prioritizing the global symbols. However, symbols may
share the same name, and symbols may also overlap address ranges, so
it's possible for searches to return multiple results. Add functions
which can return a list of multiple matching symbols.
Signed-off-by: Stephen Brennan <stephen@brennan.io>
Previously, drgn had no way to represent a thread – retrieving a stack
trace (the only extant thread-specific operation) was achieved by
requiring the user to directly provide a tid.
This commit introduces the scaffolding for the design outlined in
issue #92, and implements the corresponding methods for userspace core
dumps, the live Linux kernel, and Linux kernel core dumps. Future work
will build on top of this commit to support live userspace processes.
Signed-off-by: Kevin Svetlitski <svetlitski@fb.com>
After all of the preparatory work, the last two missing pieces are a way
to find a variable by name in the list of scopes that we saved while
unwinding, and a way to find the containing scopes of an inlined
function. With that, we can finally look up parameters and variables in
stack traces.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If we want to access a parameter or local variable in an inlined
function, then we need a stack frame for that function. It's also much
more useful to see inlined functions in the stack trace in general. So,
when we've unwound the registers for a stack frame, walk the debugging
information to find all of the (possibly inlined) functions at the
program counter, and add a drgn stack frame for each of those.
Also add StackFrame.name and StackFrame.is_inline so that we can
distinguish inline frames. Also add StackFrame.source() to get the
filename and line and column numbers. Finally, add the source code
location to pretty-printed stack traces and add pretty-printing for
individual stack frames that includes extra information.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Define that addresses for memory reads wrap around after the maximum
address rather than the current unpredictable behavior. This is done by:
1. Reworking drgn_memory_reader to work with an inclusive address range
so that a segment can contain UINT64_MAX. drgn_memory_reader remains
agnostic to the maximum address and requires that address ranges do
not overflow a uint64_t.
2. Adding the overflow/wrap-around logic to
drgn_program_add_memory_segment() and drgn_program_read_memory().
3. Changing direct uses of drgn_memory_reader_reader() to
drgn_program_read_memory() now that they are no longer equivalent.
(For some platforms, a fault might be more appropriate than wrapping
around, but this is a step in the right direction.)
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The elfutils DWARF unwinder has a couple of limitations:
1. libdwfl doesn't have an interface for getting register values, so we
have to bundle a patched version of elfutils with drgn.
2. Error handling is very awkward: dwfl_getthread_frames() can return an
error even on success, so we have to squirrel away our own errors in
the callback.
Furthermore, there are a couple of things that will be easier with our
own unwinder:
1. Integrating unwinding using ORC will be easier when we're handling
unwinding ourselves.
2. Support for local variables isn't too far away now that we have DWARF
expression evaluation.
Now that we have the register state, CFI, and DWARF expression pieces in
place, stitch them together with the new unwinder, and tweak the public
API a bit to reflect it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Use drgn_not_found where it's more appropriate, and check explicitly
against drgn_stop instead of err->code == DRGN_ERROR_STOP.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In some places, we add __ preceding and following an attribute name, and
in some places, we don't. Let's make it consistent. We might as well opt
for the __ to make clashes with macros less likely.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn_program_member_info() was replaced by drgn_type_find_member() in
commit e72ecd0e2c ("libdrgn: replace drgn_program_member_info() with
drgn_type_find_member()"). drgn_object_pointer_offset() never existed;
it's supposed to be drgn_object_dereference_offset().
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The definitions were removed but these public declarations weren't.
Fixes: 7d7aa7bf7b ("libdrgn/python: remove Type == operator")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:
- Byte order is only meaningful for scalars. What does it mean for a
struct to be big endian? A struct doesn't have a most or least
significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
byte order (DW_AT_endianity). The only producer I could find that uses
this is GCC for the scalar_storage_order type attribute, and it only
uses it for base types, not variables. GDB only seems to use to check
it for base types, as well.
So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.
The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add powerpc specific register information required to retrive the
stack traces of the tasks on both live system and from the core dump.
It uses the existing DSL format to define platform registers and
helper functions to initial them. It also adds architecture specific
information to enable powerpc. Current support is for little-endian
powerpc only.
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
enum drgn_register_number in the public libdrgn API and
drgn.Register.number in the Python bindings are basically exports of
DWARF register numbers. They only exist as a way to identify registers
that's lighter weight than string lookups. libdrgn already has struct
drgn_register, so we can use that to identify registers in the public
API and remove enum drgn_register_number. This has a couple of benefits:
we don't depend on DWARF numbering in our API, and we don't have to
generate drgn.h from the architecture files. The Python bindings can
just use string names for now. If it seems useful, StackFrame.register()
can take a Register in the future, we'll just need to be careful to not
allow Registers from the wrong platform.
While we're changing the API anyways, also change it so that registers
have a list of names instead of one name. This isn't needed for x86-64
at the moment, but will be for architectures that have multiple names
for the same register (like ARM).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In preparation for adding a "real", internal-only struct
drgn_stack_frame, replace the existing struct drgn_stack_frame with
explicit trace/frame arguments.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add struct drgn_type_template_parameter to libdrgn, the corresponding
TypeTemplateParameter to the Python bindings, and support for parsing
them from DWARF.
With this, support for templates is almost, but not quite, complete. The
main wart is that DW_TAG_name of compound types includes the template
parameters, so the type tag includes it as well. We should remove that
from the tag and instead have the type formatting code add it only when
getting the full type name.
Based on a patch from Jay Kamat.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Getting the bit field size of a member will soon require evaluating the
lazy type, so return it from drgn_member_type() instead of accessing it
directly.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In preparation for struct drgn_type referencing struct drgn_object, move
the former after the latter.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I was going to add an Object.available_ attribute, but that made me
realize that the naming is somewhat ambiguous, as a reference object
with an invalid address might also be considered "unavailable" by users.
Use the name "absent" instead, which is more clear: the object isn't
there at all.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The == operator on drgn.Type is only intended for testing. It's
expensive and slow and not what people usually want. It's going to get
even more awkward to define once types can refer to objects (for
template parameters and static members and such). Let's replace == with
a new identical() function only available in unit tests. Then, remove
the operator from the Python bindings as well as the underlying libdrgn
drgn_type_eq() and drgn_qualified_type_eq() functions.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
offsetof() can almost be implemented with Type.member(name).offset, but
that doesn't parse member designators. Add an offsetof() function that
does (and add drgn_type_offsetof() in libdrgn).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add drgn_type_has_member() to libdrgn and Type.has_member() to the
Python bindings. This can simplify some version checks, like the one in
_for_each_block_device() since commit 9a10a927b0 ("helpers: fix
for_each_{disk,partition}() on kernels >= v5.1").
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We can get struct drgn_object down from 40 bytes to 32 bytes (on x86-64)
by moving the bit_offset and little_endian members out of the value and
reference structs.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are a couple of reasons that it was the wrong choice to have a
bit_offset for value objects:
1. When we store a buffer with a bit_offset, we're storing useless
padding bits.
2. bit_offset describes a location, or in other words, part of an
address. This makes sense for references, but not for values, which
are just a bag of bytes.
Get rid of union drgn_value.bit_offset in libdrgn, make
Object.bit_offset None for value objects, and disallow passing
bit_offset to the Object() constructor when creating a value. bit_offset
can still be passed when creating an object from a buffer, but we'll
shift the bytes down as necessary to store the value with no offset.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are some situations where we can find an object but can't
determine its value, like local variables that have been optimized out,
inlined functions without a concrete instance, and pure virtual methods.
It's still useful to get some information from these objects, namely
their types. Let's add the concept of an "unavailable" object, which is
an object with a known type but unknown value/address.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I'd like to use the name drgn_object_kind to distinguish between values
and references. "Encoding" is more accurate than "kind", anyways.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The Doxygen documentation for libdrgn has bit-rotted over time. Bring
back the Internal module, clean up a few renamed members and parameters,
and fix broken parsing caused by the generic definition macros.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).
This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.
Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.
So, let's reimagine types as being owned by a program. This has a few
parts:
1. In libdrgn, simple types are now created by factory functions,
drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
Program class.
5. While we're changing the API, the parameters to pointer_type() and
array_type() are reordered to be more logical (and to allow
pointer_type() to take a default size of None for the program's
default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
argument only.
A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.
Signed-off-by: Omar Sandoval <osandov@osandov.com>