drgn_compound_type_from_dwarf() and drgn_enum_type_from_dwarf() check
the DW_AT_declaration flag to decide whether the type is a declaration
of an incomplete type or a definition of a complete type. However, they
check DW_AT_declaration with dwarf_attr_integrate(), which follows the
DW_AT_specification reference if it is present. The DIE referenced by
DW_AT_specification typically is a declaration, so this erroneously
identifies definitions as declarations. Additionally, if
drgn_debug_info_find_complete() finds the same definition, we can end up
recursing until we hit the DWARF parsing depth limit. Fix it by not
using dwarf_attr_integrate() for DW_AT_declaration.
Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
In some places, we add __ preceding and following an attribute name, and
in some places, we don't. Let's make it consistent. We might as well opt
for the __ to make clashes with macros less likely.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn_object_dereference_offset() and drgn_object_member_dereference()
are both in drgn.h.in but aren't exported. They should be.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn_program_member_info() was replaced by drgn_type_find_member() in
commit e72ecd0e2c ("libdrgn: replace drgn_program_member_info() with
drgn_type_find_member()"). drgn_object_pointer_offset() never existed;
it's supposed to be drgn_object_dereference_offset().
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The definitions were removed but these public declarations weren't.
Fixes: 7d7aa7bf7b ("libdrgn/python: remove Type == operator")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We need to use the offset of the member in the outermost object type,
not the offset in the immediate containing type in the case of nested
anonymous structs.
Fixes: e72ecd0e2c ("libdrgn: replace drgn_program_member_info() with drgn_type_find_member()")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently, reference objects and buffer value objects have a byte order.
However, this doesn't always make sense for a couple of reasons:
- Byte order is only meaningful for scalars. What does it mean for a
struct to be big endian? A struct doesn't have a most or least
significant byte; its scalar members do.
- The DWARF specification allows either types or variables to have a
byte order (DW_AT_endianity). The only producer I could find that uses
this is GCC for the scalar_storage_order type attribute, and it only
uses it for base types, not variables. GDB only seems to use to check
it for base types, as well.
So, remove the byte order from objects, and move it to integer, boolean,
floating-point, and pointer types. This model makes more sense, and it
means that we can get the binary representation of any object now.
The only downside is that we can no longer support a bit offset for
non-scalars, but as far as I can tell, nothing needs that.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Rename struct drgn_object_type to struct drgn_operand_type, add a new
struct drgn_object_type which contains all of the type-related fields
from struct drgn_object, and use it to implement drgn_object_type() and
drgn_object_type_operand(), which are replacements for
drgn_object_set_common() and drgn_object_type_encoding_and_size(). This
cleans up a lot of the boilerplate around initializing objects.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We've nominally supported complex types since commit 75c3679147
("Rewrite drgn core in C"), but parsing them from DWARF has been
incorrect from the start (they don't have a DW_AT_type attribute like we
assume), and we never implemented proper support for complex objects.
Drop the partial implementation; we can bring it back (properly) if
someone requests it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The condition register fields are numbered from most significant to
least significant. Also, the CFI for unwinding the condition register
fields restores them in their position in the condition register, so do
the same when initially populating them.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add powerpc specific register information required to retrive the
stack traces of the tasks on both live system and from the core dump.
It uses the existing DSL format to define platform registers and
helper functions to initial them. It also adds architecture specific
information to enable powerpc. Current support is for little-endian
powerpc only.
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
enum drgn_register_number in the public libdrgn API and
drgn.Register.number in the Python bindings are basically exports of
DWARF register numbers. They only exist as a way to identify registers
that's lighter weight than string lookups. libdrgn already has struct
drgn_register, so we can use that to identify registers in the public
API and remove enum drgn_register_number. This has a couple of benefits:
we don't depend on DWARF numbering in our API, and we don't have to
generate drgn.h from the architecture files. The Python bindings can
just use string names for now. If it seems useful, StackFrame.register()
can take a Register in the future, we'll just need to be careful to not
allow Registers from the wrong platform.
While we're changing the API anyways, also change it so that registers
have a list of names instead of one name. This isn't needed for x86-64
at the moment, but will be for architectures that have multiple names
for the same register (like ARM).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add a helper based on PyModule_AddType() from Python 3.9 and use it to
simplify PyInit__drgn(). Also handle errors in PyInit__drgn().
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In preparation for adding a "real", internal-only struct
drgn_stack_frame, replace the existing struct drgn_stack_frame with
explicit trace/frame arguments.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
It's easier to go from drgn_debug_info_module to Dwfl_Module than the
other direction, and I'd rather use the "higher-level"
drgn_debug_info_module wherever possible. So, store
drgn_debug_info_module in the DWARF index (which also saves a
dereference while building the index), and pass around
drgn_debug_info_module when parsing types/objects.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Otherwise, an invalid DW_TAG_template_value_parameter can be confused
for a type parameter.
Fixes: 352c31e1ac ("Add support for C++ template parameters")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The address of a per-CPU variable is really an offset into the per-CPU
area, but we're applying the load bias (i.e., KASLR offset) to it as if
it were an address, resulting in an invalid pointer when it's eventually
passed to per_cpu_ptr().
Fix this by applying the bias only if it the address is in the module's
address range. This heuristic avoids any Linux kernel-specific logic;
hopefully it doesn't have any undesired side effects.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We're going to need the module start and end in
drgn_object_from_dwarf_variable(), so pass the Dwfl_Module around and
get the bias when we need it. This means we don't need the bias from
drgn_dwarf_index_get_die(), so get rid of that, too.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We're using task->thread.sp for rsp in the initial frame for both the
struct inactive_task_frame path and frame pointer path. This is not
correct for either.
For kernels with struct inactive_task_frame, task->thread.sp points to
to the struct inactive_task_frame. The stack pointer in the initial
frame is the address immediately after the struct inactive_task_frame.
For kernels without struct inactive_task_frame, task->thread.sp points
to the saved rbp. We follow that rbp to the rbp and return address for
the initial frame; its stack pointer is the address immediately after
those.
Fixes: 10142f922f ("Add basic stack trace support")
Fixes: 51596f4d6c ("libdrgn: x86-64: remove garbage initial stack frame on old kernels")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add struct drgn_type_template_parameter to libdrgn, the corresponding
TypeTemplateParameter to the Python bindings, and support for parsing
them from DWARF.
With this, support for templates is almost, but not quite, complete. The
main wart is that DW_TAG_name of compound types includes the template
parameters, so the type tag includes it as well. We should remove that
from the tag and instead have the type formatting code add it only when
getting the full type name.
Based on a patch from Jay Kamat.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In preparation for calling the object parsing code from the type parsing
code, move it up in the file (and update the coding style in
drgn_object_from_dwarf_enumerator() while we're at it).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In order to support static members, methods, default function arguments,
and value template parameters, we need to be able to store a drgn_object
in a drgn_type_member or drgn_type_parameter. These are all cases where
we want lazy evaluation, so we can replace drgn_lazy_type with a new
drgn_lazy_object which implements the same idea but for objects. Types
can still be represented with an absent object.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Getting the bit field size of a member will soon require evaluating the
lazy type, so return it from drgn_member_type() instead of accessing it
directly.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In preparation for struct drgn_type referencing struct drgn_object, move
the former after the latter.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We're not applying the zero-length array workaround when the array type
is qualified. Make sure we pass through can_be_incomplete_array when
parsing DW_TAG_{const,restrict,volatile,atomic}_type.
Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If the language for a DWARF type is not found or unrecognized, we should
fall back to the global default, not the program default (the program
default language is for language-specific operations on the program, so
DWARF parsing shouldn't depend on it). Add a fall_back parameter to
drgn_language_from_die() and use it in DWARF parsing, and replace
drgn_language_or_default() with a drgn_default_language variable.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We should only increment a held object's reference count when it is
initially inserted into the set; subsequent holds are no-ops.
Fixes: a8d632b4c1 ("libdrgn/python: use F14 instead of PyDict for Program::objects")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
realloc(ptr, 0) is equivalent to free(ptr). It may return NULL, in which
case vector_do_shrink_to_fit() won't update the vector's data and
capacity. A subsequent append will then try to reuse the previous
allocation, causing a use-after-free. free() empty vectors explicitly
instead.
Fixes: 8d52536271 ("libdrgn: add common vector implementation")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Right now, an empty builder vector will not have anything to free, but
if we start pre-reserving these later, it will be a leak.
Fixes: c7af566c6e ("libdrgn: deduplicate all types with no members/parameters/enumerators")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Even if a compound, function, or enumerated type is complete, we can
still deduplicate it as long as it doesn't have members, parameters, or
enumerators.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Without this, the only way to check whether an object is absent in
Python is to try to use the object and catch the ObjectAbsentError.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I was going to add an Object.available_ attribute, but that made me
realize that the naming is somewhat ambiguous, as a reference object
with an invalid address might also be considered "unavailable" by users.
Use the name "absent" instead, which is more clear: the object isn't
there at all.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Make TypeMember.bit_field_size consistent with Object.bit_field_size_ by
using None to represent a non-bit field instead of 0.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The == operator on drgn.Type is only intended for testing. It's
expensive and slow and not what people usually want. It's going to get
even more awkward to define once types can refer to objects (for
template parameters and static members and such). Let's replace == with
a new identical() function only available in unit tests. Then, remove
the operator from the Python bindings as well as the underlying libdrgn
drgn_type_eq() and drgn_qualified_type_eq() functions.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Currently, we try to emulate the GNU C extension of casting a struct
type to itself. This does a deep type comparison, which is expensive. We
could take a shortcut like only comparing the kind and type name, but
seeing as standard C only allows casting to a scalar type, let's drop
support for casting to a struct (or other non-scalar) type entirely.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
offsetof() can almost be implemented with Type.member(name).offset, but
that doesn't parse member designators. Add an offsetof() function that
does (and add drgn_type_offsetof() in libdrgn).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add drgn_type_has_member() to libdrgn and Type.has_member() to the
Python bindings. This can simplify some version checks, like the one in
_for_each_block_device() since commit 9a10a927b0 ("helpers: fix
for_each_{disk,partition}() on kernels >= v5.1").
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In Python, looking up a member in a drgn Type by name currently looks
something like:
member = [member for member in type.members if member.name == "foo"][0]
Add a Type.member(name) method, which is both easier and more efficient.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Now that types are associated with their program, we don't need to pass
the program separately to drgn_program_member_info() and can replace it
with a more natural drgn_type_find_member() API that takes only the type
and member name. While we're at it, get rid of drgn_member_info and
return the drgn_type_member and bit_offset directly. This also fixes a
bug that drgn_error_member_not_found() ignores the member name length.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The TypeMember and TypeParameter instances referring to a libdrgn
drgn_lazy_type are only valid as long as the Type containing them is
still alive. Hold a reference on the containing Type from LazyType. We
can do this without growing LazyType by getting rid of the enum state
and using sentinel values for LazyType::lazy_type as the state.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We can get struct drgn_object down from 40 bytes to 32 bytes (on x86-64)
by moving the bit_offset and little_endian members out of the value and
reference structs.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are a couple of reasons that it was the wrong choice to have a
bit_offset for value objects:
1. When we store a buffer with a bit_offset, we're storing useless
padding bits.
2. bit_offset describes a location, or in other words, part of an
address. This makes sense for references, but not for values, which
are just a bag of bytes.
Get rid of union drgn_value.bit_offset in libdrgn, make
Object.bit_offset None for value objects, and disallow passing
bit_offset to the Object() constructor when creating a value. bit_offset
can still be passed when creating an object from a buffer, but we'll
shift the bytes down as necessary to store the value with no offset.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
THREAD_SIZE is still broken and I haven't looked into the root cause
(see commit 95be142d17 ("tests: disable THREAD_SIZE test")). We don't
need it anymore anyways, so let's remove it entirely.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
On old kernels, we set the initial frame as containing only rbp and let
libdwfl unwind it assuming frame pointers from there. This means that
the initial frame has a garbage rip. Follow the frame pointer and set
the previous rbp and return address ourselves instead.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are some situations where we can find an object but can't
determine its value, like local variables that have been optimized out,
inlined functions without a concrete instance, and pure virtual methods.
It's still useful to get some information from these objects, namely
their types. Let's add the concept of an "unavailable" object, which is
an object with a known type but unknown value/address.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I'd like to use the name drgn_object_kind to distinguish between values
and references. "Encoding" is more accurate than "kind", anyways.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are several places where we'd like to enforce that every
enumeration is handled in a switch. Add SWITCH_ENUM() and
SWITCH_ENUM_DEFAULT() macros for that and use them.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If virtual address translation isn't implemented for the target
architecture, then we shouldn't add the page table memory reader. If we
do, we get a DRGN_ERROR_INVALID_ARGUMENT error from
linux_helper_read_vm() instead of a DRGN_ERROR_FAULT error as expected.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If the DWARF index encounters any error while parsing, it returns an
error saying only "debug information is truncated", which makes it hard
to track down parsing errors. The kmod index parser silently swallows
errors. For both, replace the mread functions with a higher-level
binary_buffer interface that can include more information including the
location of the error. For example:
/tmp/mybinary: .debug_info+0x4: expected at least 56 bytes, have 55
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Back in commit 9ce9094ee0 ("libdrgn: dwarf_index: don't copy sections
into each CU"), I changed the sections to be individual members. The
next change will be easier if they're in an array.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are several places where we manually pass around the string name
of a tag so it can be used for error messages. Do it programatically
instead.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Linux v5.8 changed the module section structure, so we need to get the
section name differently.
Closes#73.
Reported-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If cache_kernel_module_sections() in report_loaded_kernel_module()
fails, we continue to the next iteration without advancing to the next
kernel module. Then, we fail on that same kernel module and repeat. Make
sure that we go to the next kernel module.
Fixes: 423d2cd500 ("libdrgn: dwarf_index: rework file reporting")
Reported-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We're freeing path and then using it to report an error.
This has some weird knock-on effects. Since we freed the path, the error
message contains garbage. So, PyErr_SetString() can't decode it as a
UTF-8 string. The end result is a MissingDebugInfoError with no message.
Fix it by creating the error before freeing the path.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We use /proc/modules and /sys/module to find loaded kernel modules for
the running kernel instead of walking the module list in the core dump
as an optimization. To make it easier to test the core dump path, add an
environment variable to disable the optimization.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The next commit will allow using the offline path for the live kernel,
so the offline naming won't make much sense. Fold the offline path into
the top-level functions, and make the live path an escape hatch. Also
add some comments and improve naming for the file and directory handles
and update the coding style.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
These were added in commit e5874ad18a ("libdrgn: use libdwfl"), but
they have never been used. Remove them.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Use *_hash_pair() for hash functions that do the full double hashing and
return a struct hash_pair and hash_*() for other hashing utility
functions. Also change some of the equality function names to be more
symmetric and improve the documentation.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
min() and max() from the Linux kernel go through the trouble of
resulting in a constant expression if the arguments are constant
expressions, but they can't be used outside of a function due to their
use of ({ }). This means that they can't be used for, e.g., enumerators
or global arrays. Let's simplify min() and max() and instead add
explicit min_iconst() and max_iconst() macros that can be used
everywhere that an integer constant expression is required. We can then
use it in hash_table.h. While we're here, let's split these into their
own header file and document them better.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn_type_members_eq() skips comparing the types of anonymous members.
Fix that and add a test for it.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The Doxygen documentation for libdrgn has bit-rotted over time. Bring
back the Internal module, clean up a few renamed members and parameters,
and fix broken parsing caused by the generic definition macros.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).
This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The elfutils header files should be treated as if they were in the
standard location, so use -isystem instead of -I.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If we create a pending CU for a namespace, then add more CUs to the
index, the CU might get reallocated, resulting in a use after free. Fix
it by storing the index of the CU instead of the pointer.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Debugging information tracking is currently in two places: drgn_program
finds debugging information, and drgn_dwarf_index stores it. Both of
these responsibilities make more sense as part of drgn_debug_info, so
let's move them there. This prepares us to track extra debugging
information that isn't pertinent to indexing.
This also reworks a couple of details of loading debugging information:
- drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into
a single structure, drgn_debug_info_module.
- The first pass of DWARF indexing now happens in parallel with reading
compilation units (by using OpenMP tasks).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
DWARF represents namespaces with DW_TAG_namespace DIEs. Add these to the
DWARF index, with each namespace being its own sub-index. We only index
the namespace itself when it is first accessed, which should help with
startup time and simplifies tracking.
Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
In order to index namespaces lazily, we need the CU structures. Rename
struct compilation_unit to the less generic struct drgn_dwarf_index_cu
and keep the CUs in a vector in the dindex.
Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
For namespace support, we will want to access the struct
drgn_dwarf_index_die for namespaces instead of the Dwarf_Die. Split
drgn_dwarf_index_get_die() out of drgn_dwarf_index_iterator_next().
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are a couple of related ways that we can cause undefined behavior
when parsing a malformed DWARF or depmod index file:
1. There are several places where we increment the cursor to skip past
some data. It is undefined behavior if the result points out of
bounds of the data, even if we don't attempt to dereference it.
2. read_in_bounds() checks that ptr <= end. This pointer comparison is
only defined if ptr and end both point to elements of the same array
object or one past the last element. If ptr has gone past end, then
this comparison is likely undefined anyways.
Fix it by adding a helper to skip past data with bounds checking. Then,
all of the helpers can assume that ptr <= end and maintain that
invariant. while we're here and auditing all of the call sites, let's
clean up the API and rename it from read_foo() to the less generic
mread_foo().
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Now that we can handle a DW_AT_specification that references another
compilation unit, add support for DW_FORM_ref_addr.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We currently handle DIEs with a DW_AT_specification attribute by parsing
the corresponding declaration to get the name and inserting the DIE as
usual. This has a couple of problems:
1. It only works if DW_AT_specification refers to the same compilation
unit, which is true for DW_FORM_ref{1,2,4,8,_udata}, but not
DW_FORM_ref_addr. As a result, drgn doesn't support the latter.
2. It assumes that the DIE with DW_AT_specification is in the correct
"scope". Unfortunately, this is not true for g++: for a variable
definition in a C++ namespace, it generates a DIE with
DW_AT_declaration as a child of the DW_TAG_namespace DIE and a DIE
which refers to the declaration with DW_AT_specification _outside_ of
the DW_TAG_namespace as a child of the DW_TAG_compilation_unit DIE.
Supporting both of these cases requires reworking how we handle
DW_AT_specification. This commit takes an approach of parsing the DWARF
data in two passes: the first pass reads the abbrevation and file name
tables and builds a map of instances of DW_AT_specification; the second
pass indexes DIEs as before, but ignores DIEs with DW_AT_specification
and handles DIEs with DW_AT_declaration by looking them up in the map
built by the first pass.
This approach is a 10-20% regression in indexing time in the benchmarks
I ran. Thankfully, it is not 100% slower for a couple of reasons. The
first is that the two passes are simpler than the original combined
pass. The second is that a decent part of the indexing time is spent
faulting in the mapped debugging information, which only needs to happen
once (even if the file is cached, minor page faults add non-negligible
overhead).
This doesn't handle DW_AT_specification "chains" yet, but neither did
the original code. If it is necessary, it shouldn't be too difficult to
add.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
It's very unlikely that we'll ever index more than 4 billion DIEs in a
single shard, so we can shrink the index a bit by using uint32_t
indices (and uint8_t tag).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I originally copied the sections into each compilation unit to avoid a
pointer indirection, but performance-wise it's a wash, so we might as
well save the memory. This will be more important when we keep the CUs
after indexing.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In read_cus(), the master thread can use the final CUs vector directly
and the rest of the threads can merge their private vectors in. This
consistently shaves a few milliseconds off of startup.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We currently assume that if DW_AT_declaration is present, it is true.
This seems to be true in practice, and I see no reason to ever use
DW_FORM_flag with a value of zero. There's no performance hit to handle
it, though, so we might as well.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
As a small simplification, we can take commit 9bb2ccecb7 ("Enable
DWARF indexing to work with partial units") further and not look at the
tag of the top-level DIE at all.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The CU unit length and DIE offset are both limited by the size of the
mapped debugging information, i.e., size_t.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
This is more clear: although these flags happen to be encoded with the
DWARF tag, they are flags regarding the DIE.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If we fail to read an include directory in read_file_name_table(), we
need to free the directory hashes.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Commit 326107f054 ("libdrgn: add task_state_to_char() helper")
implemented task_state_to_char() in libdrgn so that it could be used in
commit 4780c7a266 ("libdrgn: stack_trace: prohibit unwinding stack of
running tasks"). As of commit eea5422546 ("libdrgn: make Linux kernel
stack unwinding more robust"), it is no longer used in libdrgn, so we
can translate it to Python. This removes a bunch of code and is more
useful as an example.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I originally did it this way because pydoc doesn't handle non-trivial
defaults in signature very well (see commit 67a16a09b8 ("tests: test
that Python documentation renders")). drgndoc doesn't generate signature
for pydoc anymore, though, so we don't need to worry about it and can
clean up the typing.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.
Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.
So, let's reimagine types as being owned by a program. This has a few
parts:
1. In libdrgn, simple types are now created by factory functions,
drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
Program class.
5. While we're changing the API, the parameters to pointer_type() and
array_type() are reordered to be more logical (and to allow
pointer_type() to take a default size of None for the program's
default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
argument only.
A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
struct drgn_program has a bunch of state scattered around. Group it
together more logically, even if it means sacrificing some padding here
and there.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Most places that call these check has_platform and return an error, and
those that don't can live with the extra check.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Program::objects is used to store references to objects that must stay
alive while the Program is alive. It is currently a PyDict where the
keys are the object addresses as PyLong and the values are the objects
themselves. This has two problems:
1. Allocating the key as a full object is obviously wasteful.
2. PyDict doesn't have an API for reserving capacity ahead of time,
which we want for an upcoming change.
Both of these are easily fixed by using our own hash table.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
drgn_object_init() is available in drgh.h file and seems to a required
call before calling drgn_program_find_object().
Without this, trying to call drgn_object_init() from an external C
application results in undefined reference.
Signed-off-by: Aditya Sarwade <asarwade@fb.com>
Now that drgndoc can handle overloads and we have the IntegerLike and
Path aliases, we can add type annotations to all helpers. There are also
a couple of functional changes that snuck in here to make annotating
easier.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Rather than duplicating Union[str, bytes, os.PathLike] everywhere, add
an alias. Also make it explicitly os.PathLike[str] or os.PathLike[bytes]
to get rid of some mypy --strict errors.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Lots if interfaces in drgn transparently turn an integer Object into an
int by using __index__(), so add an IntegerLike protocol for this and
use it everywhere applicable.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Make the KASLR offset available to Python in a new
drgn.helpers.linux.boot module, and move pgtable_l5_enabled() there,
too.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Clean up the coding style of the remaining few places that the last
couple of changes didn't rewrite.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The folly F14 implementation provides 3 storage policies: value, node,
and vector. The default F14FastMap/F14FastSet chooses between the value
and vector policies based on the value size.
We currently only implement the value policy, as the node policy is easy
to emulate and the vector policy would've added more complexity. This
adds support for the vector policy (adding even more C abuse :) and
automatically chooses the policy the same way as folly. It'd be easy to
add a way to choose the policy if needed.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The only major change to the folly F14 implementation since I originally
ported it is commit 3d169f4365cf ("memory savings for F14 tables with
explicit reserve()"). That is a small improvement for small tables and a
large improvement for vector tables, which are about to be added.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
posix_memalign() doesn't have the restriction that the size must be a
multiple of the alignment like aligned_alloc() does in C11.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Compile-time constants have DW_AT_const_value instead of DW_AT_location.
We can translate those to a value object.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Really it's more of a test program than an example program. It's useful
for benchmarking, testing with valgrind, etc. It's not built by default,
but it can be built manually with:
$ make -C build/temp.* examples/load_debug_info
And run with:
$ ./build/temp.*/examples/load_debug_info
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Commit eea5422546 ("libdrgn: make Linux kernel stack unwinding more
robust") overlooked that if the task is running in userspace, the stack
pointer in PRSTATUS obviously won't match the kernel stack pointer.
Let's bite the bullet and use the PID. If the race shows up in practice,
we can try to come up with another workaround.
I once tried to implement a generic arithmetic right shift macro without
relying on any implementation-defined behavior, but this turned out to
be really hard. drgn is fairly tied to GCC and GCC-compatible compilers
(like Clang), so let's just assume GCC's model [1]: modular conversion
to signed types, two's complement signed bitwise operators, and sign
extension for signed right shift.
1: https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.
This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
We were forgetting to mask away the extra bits. There are two places
that we use the tag without converting it to a uint8_t:
hash_table_probe_delta(), which is mostly benign since we mask it by the
chunk mask anyways; and table_chunk_match() without SSE 2, which
completely breaks.
While we're here, let's align the comments better.
After thinking about it some more, I realized that "libdwfl: simplify
activation frame logic" breaks the case where during unwinding someone
queries isactivation for reasons other than knowing whether to decrement
program counter. Revert the patch and refactor "libdwfl: add interface
for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame" to handle it
differently.
Based on:
c95081596 size: Also obey radix printing for bsd format.
With the following patches:
configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: add interface for attaching to/detaching from threads
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
drgn has a couple of issues unwinding stack traces for kernel core
dumps:
1. It can't unwind the stack for the idle task (PID 0), which commonly
appears in core dumps.
2. It uses the PID in PRSTATUS, which is racy and can't actually be
trusted.
The solution for both of these is to look up the PRSTATUS note by CPU
instead of PID.
For the live kernel, drgn refuses to unwind the stack of tasks in the
"R" state. However, the "R" state is running *or runnable*, so in the
latter case, we can still unwind the stack. The solution for this is to
look at on_cpu for the task instead of the state.
We currently unwind from pt_regs and NT_PRSTATUS using an array of
register definitions. It's more flexible and more efficient to do this
with an architecture-specific callback. For x86-64, this change also
makes us depend on the binary layout rather than member names of struct
pt_regs, but that shouldn't matter unless people are defining their own,
weird struct pt_regs.
The model has always been that drgn Objects are immutable, but for some
reason I went through the trouble of allowing __init__() to reinitialize
an already initialized Object. Instead, let's fully initialize the
Object in __new__() and get rid of __init__().
It's annoying to have to do value= when creating objects, especially in
interactive mode. Let's allow passing in the value positionally so that
`Object(prog, "int", value=0)` becomes `Object(prog, "int", 0)`. It's
clear enough that this is creating an int with value 0.
drgn was originally my side project, but for awhile now it's also been
my work project. Update the copyright headers to reflect this, and add a
copyright header to various files that were missing it.
For functions that call a noreturn function, the compiler may omit code
after the call instruction. This means that the return address may not
lie in the caller's symbol. dwfl_frame_pc() returns whether a frame is
an "activation", i.e., its program counter is guaranteed to lie within
the caller. This is only the case for the initial frame, frames
interrupted by a signal, and the signal trampoline frame. For everything
else, we need to decrement the program counter before doing any lookups.
Rebase on master and fix dwfl_frame_module/dwfl_frame_dwarf_frame to
decrement the program counter when necessary.
Based on:
a8493c12a libdw: Skip imported compiler_units in libdw_visit_scopes walking DIE tree
With the following patches:
configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: simplify activation frame logic
libdwfl: add interface for attaching to/detaching from threads
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for evaluating DWARF expressions in a frame
Now that we can walk page tables, we can use it in a memory reader that
reads kernel memory via the kernel page table. This means that we don't
need libkdumpfile for ELF vmcores anymore (although I'll keep the
functionality around until this code has been validated more).
I originally wanted to avoid depending on another vmcoreinfo field, but
an the next change is going to depend on swapper_pg_dir in vmcoreinfo
anyways, and it ends up being simpler to use it.
There are a few big use cases for this in drgn:
* Helpers for accessing memory in the virtual address space of userspace
tasks.
* Removing the libkdumpfile dependency for vmcores.
* Handling gaps in the virtual address space of /proc/kcore (cf. #27).
I dragged my feet on implementing this because I thought it would be
more complicated, but the page table layout on x86-64 isn't too bad.
This commit implements page table walking using a page table iterator
abstraction. The first thing we'll add on top of this will be a helper
for reading memory from a virtual address space, but in the future it'd
also be possible to export the page table iterator directly.
UNARY_OP_SIGNED_2C() uses a union of int64_t and uint64_t to avoid
signed integer overflow... except that there's a typo and the uint64_t
is actually an int64_t. Fix it and add a test that would catch it with
-fsanitize=undefined.
internal.h includes both drgn-specific helpers and generic utility
functions. Split the latter into their own util.h header and use it
instead of internal.h in the generic data structure code. This makes it
easier to copy the data structures into other projects/test programs.
Program_load_debug_info() is the last user of the
resize_array()/realloc_array() utility functions. We can clean it up by
using a vector and finally get rid of those functions.
This also happens to fix three bugs in Program_load_debug_info(): we
weren't setting a Python exception if we couldn't allocate the path_args
array, we weren't zeroing path_args after resizing the array, and we
weren't freeing the path_args array. Shame on whoever wrote this.
DRGN_UNREACHABLE() currently expands to abort(), but assert() provides
more information. If NDEBUG is defined, we can use
__builtin_unreachable() instead.
DRGN_UNREACHABLE() isn't drgn-specific, so this renames it to
UNREACHABLE(). It's also not really related to errors, so this moves it
to internal.h.
Before Linux v4.11, /proc/kcore didn't have valid physical addresses, so
it's currently not possible to read from physical memory on old kernels.
However, if we can figure out the address of the direct mapping, then we
can determine the corresponding physical addresses for the segments and
add them.
We treat core dumps with all zero p_paddrs as not having valid physical
addresses. However, it is theoretically possible for a kernel core dump
to only have one segment which legitimately has a p_paddr of 0 (e.g., if
it only has a segment for the direct mapping, although note that this
isn't currently possible on x86, as Linux on x86 reserves PFN 0 for the
BIOS [1]).
If the core dump has a VMCOREINFO note, then it is either a vmcore,
which has valid physical addresses, or it is /proc/kcore with Linux
kernel commit 23c85094fe18 ("proc/kcore: add vmcoreinfo note to
/proc/kcore") (in v4.19), so it must also have Linux kernel commit
464920104bf7 ("/proc/kcore: update physical address for kcore ram and
text") (in v4.11) (ignoring the possibility of a franken-kernel which
backported the former but not the latter). Therefore, treat core dumps
with a VMCOREINFO note as having valid physical addresses.
1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/setup.c?h=v5.6#n678
Even on 32-bit architectures, physical addresses can be 64 bits (e.g.,
x86 with PAE). It's unlikely that the vmcoreinfo note would be in such
an address, but let's always parse it as a uint64_t just to be safe.
I've found that I do this manually a lot (e.g., when digging through a
task's stack). Add shortcuts for reading unsigned integers and a note
for how to manually read other formats.
DrgnObject_getattro() uses PyObject_GenericGetAttr() and catches the
AttributeError raised if the name is not an attribute of the Object
class. If the member is found, we then destroy the AttributeError.
Raising an exception only to destroy it is obviously wasteful. Luckily,
as of Python 3.7, the lower-level _PyObject_GenericGetAttrWithDict() can
suppress the AttributeError; we can raise it ourselves if we need it. In
my microbenchmarks, this makes Object.__getattribute__() at least twice
as fast when the member exists.
This also fixes a drgn_error leak.
Loading debuginfo from a kernel with separate or re-combined debuginfo
results in the following failure:
Traceback (most recent call last):
File "/home/jeffm/.local/bin/drgn", line 11, in <module>
load_entry_point('drgn==0.0.3+39.gbf05d9bf', 'console_scripts', 'drgn')()
File "/home/jeffm/.local/lib/python3.6/site-packages/drgn-0.0.3+39.gbf05d9bf-py3.6-linux-x86_64.egg/drgn/internal/cli.py", line 121, in main
prog.load_debug_info(args.symbols, **args.default_symbols)
Exception: invalid DW_AT_decl_file 10
A typical kernel build results in DWARF information with
DW_TAG_compile_unit tags but the objcopy --only-keep-debug and eu-unstrip
commands use DW_TAG_partial_unit tags. The DWARF indexer only handles
the former, so the file table isn't populated and we get the exception.
Fortunately, the two are more or less interchangeable. (See
http://dwarfstd.org/doc/Dwarf3.pdf, section E.2.3). Accepting both for
indexing compile units results in a working session.
compound_initializer_init_next() saves a pointer to the compound
initializer stack and uses it after appending to the stack, which may
have reallocated the stack.
For C++ support, we need to add an array of template parameters to
struct drgn_type. struct drgn_type already has arrays for members,
enumerators, and parameters embedded at the end of the structure,
because no type needs more than one of those. However, struct, union,
and class types may need members and template parameters. We could add a
separate array of templates, but then it gets confusing having two
methods of storing arrays in struct drgn_type. Let's make these arrays
separate instead of embedding them.
Similarly to PAGE_OFFSET, vmemmap makes more sense as part of the Linux
kernel object finder than an internal helper.
While we're here, let's fix the definition for 5-level page tables. This
only matters for kernels with commit 77ef56e4f0fb ("x86: Enable 5-level
paging support via CONFIG_X86_5LEVEL=y") but without eedb92abb9bb
("x86/mm: Make virtual memory layout dynamic for CONFIG_X86_5LEVEL=y")
(namely, v4.14, v4.15, and v4.16); since v4.17, 5-level page table
support enables KASLR.
The internal _page_offset() helper gets the value of PAGE_OFFSET, but
the fallback when KASLR is disabled has been out of date since Linux
v4.20 and never handled 5-level page tables. Additionally, it makes more
sense as part of the Linux kernel (formerly vmcoreinfo) object finder so
that it's cleanly accessible outside of drgn internals.
The configure script allows the user to not use any openmp
implementation but dwarf_index.c uses the locking APIs unconditionally.
This compiles but fails at runtime.
Adding simple stubs for the locking API. This is useful when debugging
crashes in dwarf indexing during development.
elfutils 0.179 was just released, and it includes my fix for vmcores
with >= 2^16 phdrs.
Based on:
3a7728808 Prepare for 0.179
With the following patches:
configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: add interface for attaching to/detaching from threads
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for evaluating DWARF expressions in a frame
Jay reported that the default language detection was happening too early
and not finding "main". We need to make sure to do it after the DWARF
index is actually populated. The problem with that is that it makes
error reporting much harder, as we don't want to return a fatal error
from drgn_program_set_language_from_main() if we actually succeeded in
loading debug info. That means we probably need to ignore errors in
drgn_program_set_language_from_main(). To reduce the surface area where
we'd be failing, let's get the language directly from the DWARF index.
This also allows us to avoid setting the language if it's actually
unknown (information which is lost by the time we convert it to a
drgn_object in the current code).
Currently, drgn_language_from_die() returns the default language when it
encounters an unknown DW_LANG because the dwarf_info_cache always wants
a language. The next change will want to detect the unknown language
case, so make drgn_language_from_die() return NULL if the language is
unknown, move it to language.c, and fold drgn_language_from_dw_lang()
into it.
We need to keep the Program alive for its types to stay valid, not just
the objects the Program has pinned. (I have no idea why I changed this
in commit 565e0343ef ("libdrgn: make symbol index pluggable with
callbacks").)
For operations where we don't have a type available, we currently fall
back to C. Instead, we should guess the language of the program and use
that as the default. The heurisitic implemented here gets the language
of the CU containing "main" (except for the Linux kernel, which is
always C). In the future, we should allow manually overriding the
automatically determined language.
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
This way, languages can be identified by an index, which will be useful
for adding Python bindings for drgn_language and for adding a language
field to drgn_type.
I've been wanting to add type hints for the _drgn C extension for
awhile. The main blocker was that there is a large overlap between the
documentation (in docs/api_reference.rst) and the stub file, and I
really didn't want to duplicate the information. Therefore, it was a
requirement that the the documentation could be generated from the stub
file, or vice versa. Unfortunately, none of the existing tools that I
could find supported this very well. So, I bit the bullet and wrote my
own Sphinx extension that uses the stub file as the source of truth (and
subsumes my old autopackage extension and gen_docstrings script).
The stub file is probably incomplete/inaccurate in places, but this
should be a good starting point to improve on.
Closes#22.
I thought I'd be able to avoid adding a separate API for register values
and reuse dwfl_frame_eval_expr(), but this doesn't work if the frame is
missing debug information but has known register values (e.g., if the
program crashed with an invalid instruction pointer).
Rebase on master, add the improved
dwfl_frame_module/dwfl_frame_dwarf_frame patch, and add the
dwfl_frame_register patch.
Based on:
889edd912 PR25365: debuginfod-client: restrict cleanup to client-pattern files
With the following patches:
configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: add interface for attaching to/detaching from threads
libdwfl: add interface for getting Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: export __libdwfl_frame_reg_get as dwfl_frame_register
libdwfl: add interface for evaluating DWARF expressions in a frame
UTS_RELEASE is currently only accessible once debug info is loaded with
prog.load_debug_info(main=True). This makes it difficult to get the
release, find the appropriate vmlinux, then load the found vmlinux. We
can add vmcoreinfo_object_find as part of set_core_dump(), which makes
it possible to do the following:
prog = drgn.Program()
prog.set_core_dump(core_dump_path)
release = prog['UTS_RELEASE'].string_()
vmlinux_path = find_vmlinux(release)
prog.load_debug_info([vmlinux_path])
The only downside is that this ends up using the default definition of
char rather than what we would get from the debug info, but that
shouldn't be a big problem.
The osrelease is accessible via init_uts_ns.name.release, but we can
also get it straight out of vmcoreinfo, which will be useful for the
next change. UTS_RELEASE is the name of the macro defined in the kernel.
This continues the conversion from the last commit. Members and
parameters are basically the same, so we can do them together. Unlike
enumerators, these don't make sense to unpack or access as sequences.
Currently, type members, enumerators, and parameters are all represented
by tuples in the Python bindings. This is awkward to document and
implement. Instead, let's replace these tuples with proper types,
starting with the easiest one, TypeEnumerator. This one still makes
sense to treat as a sequence so that it can be unpacked as (name,
value).
Currently, we print:
>>> prog.symbol(prog['init_task'])
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: cannot convert 'struct task_struct' to index
It's not obvious what it means to convert to an index. Instead, let's
use the error message raised by operator.index():
TypeError: 'struct task_struct' object cannot be interpreted as an integer
Decouple some of the responsibilities of FaultError to
OutOfBoundsError so consumers can differentiate between
invalid memory accesses and running out of bounds in
drgn Objects which may be based on valid memory address.
At Facebook, we link OpenMP code with libomp instead of libgomp. We have
an internal patch to drgn to do this, as it can't be done by setting
CFLAGS/LDFLAGS. Let's add a way to specify the OpenMP library at
configure time so that we can drop the internal patch.
When investigating a reported bug, it's important to know which exact
version of drgn is being used. Let's include the git revision in the
version printed by the CLI in PEP440 format. This commit will also serve
as the 0.0.1 release.
Currently the drgn version number is defined in drgn.h.in, and configure
and setup.py both parse it out of there. However, now that we're
generating drgn.h anyways, it's easier to make configure.ac the source
of truth.
When we're checking whether the element that we formatted on one line
would fit on the previous line, we check whether the previous line is
empty with remaining_columns == start_columns. This is never true, as
remaining_columns is always set to start_columns - 1 at most, and it
only decreases from there until we start a new line.
drgn_object_truthiness() is a misnomer, as truthiness is a
language-specific concept. Instead, invert the return value and rename
it to drgn_object_is_zero(), which more accurately conveys the meaning.
Instead of having two internal variants (drgn_find_symbol_internal() and
drgn_program_find_symbol_in_module()), combine them into the former and
add a separate drgn_error_symbol_not_found() for translating the static
error to the user-facing one. This makes things more flexible for the
next change.
We'd like to have more control over how objects are formatted. I
considered defining a custom string format specification syntax, but
that's not easily discoverable. Instead, let's get rid of the current
format specification support and replace it with a normal method.
In preparation for making drgn_pretty_print_object() more flexible
(i.e., not always "pretty"), rename it to drgn_format_object(). For
consistency, let's rename drgn_pretty_print_type_name(),
drgn_pretty_print_type(), and drgn_pretty_print_stack_trace(), too.
Commit f327552229 ("libdrgn: add strstartswith()") flipped the test
for a name entry in modinfo. This introduced a regression resulting in
kernel modules not loading at the right offset. This patch fixes the
regression.
The previous commit was the real fix for the failed symbol lookups. On
the bright side, the build fixes were merged, so we can rebase on master
and drop those.
Based on:
277c2c54f libcpu: Compile i386_lex.c with -Wno-implicit-fallthrough
With the following patches:
configure: Add --disable-programs
configure: Add --disable-shared
libdwfl: add interface for attaching to/detaching from threads
libdwfl: cache Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
It turns out this wasn't a problem with dwfl_addrmodule() at all; the
real problem is that .init sections are freed once the module is loaded
but we're still considering them for the address range we pass to
dwfl_report_module(). Ignore those sections entirely (by omitting them
from the section name to section index map). While we're here, let's not
bother inserting non-SHF_ALLOC sections in the map.
This fixes the issue that Program.symbol() sometimes fails for kernel
module symbols.
Based on:
2c7c4037 elfutils.spec.in: Sync with fedora spec, remove rhel/fedora specifics.
With the following patches:
configure: Add --disable-programs
configure: Add --disable-shared
configure: Fix -D_FORTIFY_SOURCE=2 check when CFLAGS contains -Wno-error
libcpu: compile i386_lex.c with -Wno-implicit-fallthrough
libdwfl: add interface for attaching to/detaching from threads
libdwfl: cache Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
libdwfl: return error from __libdwfl_relocate_value for unloaded sections
libdwfl: remove broken coalescing logic in dwfl_report_segment
libdwfl: store module lookup table separately from segments
libdwfl: use sections of relocatable files for dwfl_addrmodule
For live userspace processes, we add a single [0, UINT64_MAX) memory
file segment for /proc/$pid/mem. Of course, not every address in that
range is valid; reading from an invalid address returns EIO. We should
translate this to a DRGN_ERROR_FAULT instead of DRGN_ERROR_OS, but only
for /proc/$pid/mem.
In commit 55a9700435 ("libdrgn: python: accept integer-like arguments
in more places"), I converted Program_symbol to use index_converter but
forgot to initialize the struct index_arg. Then, in commit c243daed59
("Translate find_task() helper (and dependencies) to C"), I added a
bunch more cases of uninitialized struct index_arg. If
index_arg.allow_none gets a non-zero garbage value, then this can end up
allowing None through when it shouldn't. Furthermore, since commit
2561226918 ("libdrgn: python: add signed integer support to
index_converter"), if index_arg.is_signed gets a non-zero garbage value,
then this will try to get a signed integer when we're expecting an
unsigned integer, which can blow up for values >= 2**63 (like kernel
symbols). Fix it by initializing struct index_arg everywhere.
Fixes#30.
Rebase on 0.178. The only additional change needed is to pass
--disable-debuginfod to configure.
Based on:
2c7c4037 elfutils.spec.in: Sync with fedora spec, remove rhel/fedora specifics.
With the following patches:
configure: Add --disable-programs
configure: Add --disable-shared
configure: Fix -D_FORTIFY_SOURCE=2 check when CFLAGS contains -Wno-error
libcpu: compile i386_lex.c with -Wno-implicit-fallthrough
libdwfl: add interface for attaching to/detaching from threads
libdwfl: cache Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
Matt Ahrens reported that comparing two types would sometimes end up in
a seemingly infinite loop, which he discovered was because we repeat
comparisons of types as long as they're not in a cycle. Fix it by
caching all comparisons during a call.
Some tests (e.g., tests.test_object.TestSpecialMethods.test_round) are
printing:
DeprecationWarning: an integer is required (got type float). Implicit
conversion to integers using __int__ is deprecated, and may be removed
in a future version of Python.
See https://bugs.python.org/issue36048. This is coming from calls like:
Object(prog, 'int', value=1.5)
We actually want the truncating behavior, so explicitly call
PyNumber_Long().
If we only want debugging information for vmlinux and not kernel
modules, it'd be nice to only load the former. This adds a load_main
parameter to drgn_program_load_debug_info() which specifies just that.
For now, it's only implemented for the Linux kernel. While we're here,
let's make the paths parameter optional for the Python bindings.
This implements the first step at supporting C++: class types. In
particular, this adds a new drgn_type_kind, DRGN_TYPE_CLASS, and support
for parsing DW_TAG_class_type from DWARF. Although classes are not valid
in C, this adds support for pretty printing them, for completeness.
Python 3.8 replaced the unused void *tp_print field with Py_ssize_t
tp_vectorcall_offset, so with -Werror we get "error: initialization of
‘long int’ from ‘void *’ makes integer from pointer without a cast".
Let's just use designated initializers.
We currently don't check that the task we're unwinding is actually
blocked, which means that linux_kernel_set_initial_registers_x86_64()
will get garbage from the stack and we'll return a nonsense stack trace.
Let's avoid this by checking that the task isn't running if we didn't
find a NT_PRSTATUS note.
Add a helper to get the state of a task (e.g., 'R', 'S', 'D'). This will
be used to make sure that a task is not running when getting a stack
trace, so implement it in libdrgn.
When debugging the Linux kernel, it's inconvenient to have to get the
task_struct of a thread in order to get its stack trace. This adds
support for looking it up solely by PID. In that case, we do the
find_task() inside of libdrgn. This also gives us stack trace support
for userspace core dumps almost for free since we already added support
for NT_PRSTATUS.
vmcores include a NT_PRSTATUS note for each CPU containing the PID of
the task running on that CPU at the time of the crash and its registers.
We can use that to unwind the stack of the crashed tasks.
Currently, we close the Elf handle in drgn_set_core_dump() after we're
done with it. However, we need the Elf handle in
userspace_report_debug_info(), so we reopen it temporarily. We will also
need it to support getting stack traces from core dumps, so we might as
well keep it open. Note that we keep it even if we're using libkdumpfile
because libkdumpfile doesn't seem to have an API to access ELF notes.
We'd like to be able to look up tasks by PID from libdrgn, but those
helpers are written in Python. Translate them to C and add some thin
bindings so we can use the same implementation from Python.
There are some cases where we want to read an integer regardless of its
signedness, so drgn_object_read_signed() and drgn_object_read_unsigned()
are cumbersome to use, and drgn_object_read_value() is too permissive.
Currently, the only information available from a stack frame is the
program counter. Eventually, we'd like to add support for getting
arguments and local variables, but that will require more work. In the
mean time, we can at least get the values of other registers. A
determined user can read the assembly for the code they're debugging and
derive the values of variables from the registers.
Rebase the existing patches and add the patches which extend the libdwfl
stack frame interface.
Based on:
47780c9e elflint, readelf: enhance error diagnostics
With the following patches:
configure: Add --disable-programs
configure: Add --disable-shared
configure: Fix -D_FORTIFY_SOURCE=2 check when CFLAGS contains -Wno-error
libcpu: compile i386_lex.c with -Wno-implicit-fallthrough
libdwfl: don't bother freeing frames outside of dwfl_thread_getframes
libdwfl: only use thread->unwound for initial frame
libdwfl: add interface for attaching to/detaching from threads
libdwfl: cache Dwfl_Module and Dwarf_Frame for Dwfl_Frame
libdwfl: add interface for evaluating DWARF expressions in a frame
In order to retrieve registers from stack traces, we need to know what
registers are defined for a platform. This adds a small DSL for defining
registers for an architecture. The DSL is parsed by an awk script that
generates the necessary tables, lookup functions, and enum definitions.