We currently deduplicate entries in the DWARF index by (name, tag, file
name). We want to add support for looking up nested classes, so this is
a problem: not every DIE defining a class also defines all of its nested
types, so the one DIE we index may not allow us to find every nested
class. Instead, we need to index every DIE with a given name.
This sounds horribly expensive, both in terms of CPU and memory, but we
can mitigate this in several ways:
- We no longer need to parse the file name table, cache file name
hashes, parse DW_AT_decl_file, or store the file name hash for indexed
DIEs.
- Instead of storing the tag for each indexed DIE, we can split the DIE
map into a map per tag.
- We can store the DIEs matching a name in a vector instead of a linked
list.
- We can use the new inline entry and small size variants of vectors.
- We can move struct drgn_namespace_dwarf_index * to a tree separate
from the indexed DIEs. After all of these changes, we only need a
single uintptr_t per indexed DIE.
- We can get rid of the struct drgn_dwarf_index_pending_die list for a
namespace and use the indexed DIEs instead, which are half the size.
- DW_TAG_base_type maps can be assumed to be globally unique, so they
can be stored in their own map of one DIE indexed only by name.
- Each thread can independently build the DIE maps without any
synchronization to be merged at the end.
Here are some performance results comparing the New version (this
commit) to the Old version (commit 16164dbe6e ("libdrgn: detect
flattened vmcores and raise error")). Application is either a large,
statically-linked C++ application or the live Linux kernel. Threads is
the OMP_NUM_THREADS setting used (the machine used for testing has 80
CPUs). Time is the amount of time it took to load and index debugging
information. Anon is the amount of anonymous (e.g., heap) memory used.
File is the amount of file memory used. Large C++
Application | Threads | Version | Time | Anon | File
------------+---------+---------+--------+--------+-------
Large C++ | 80 | New | 5 s | 3.5 GB | 1.4 GB
| | Old | 15 s | 5.2 GB | 1.7 GB
| 8 | New | 6.5 s | 3.4 GB | 1.4 GB
| | Old | 10 s | 5.2 GB | 1.7 GB
| 1 | New | 30 s | 3.4 GB | 1.4 GB
| | Old | 51 s | 5.2 GB | 1.7 GB
Linux | 80 | New | 270 ms | 128 MB | 300 MB
| | Old | 380 ms | 73 MB | 326 MB
| 8 | New | 240 ms | 115 MB | 300 MB
| | Old | 240 ms | 73 MB | 326 MB
| 1 | New | 700 ms | 87 MB | 300 MB
| | Old | 800 ms | 73 MB | 326 MB
The results show that the new approach is almost always faster. For the
large C++ application, it is much better for both time and memory usage.
For the Linux kernel, it is slightly faster and uses more anonymous
memory, although that is partially offset by less file memory. (For the
Linux kernel, there is a dip in performance for both approaches from 8
threads to 80 which is worth looking into later.)
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In commit 26291647eb ("libdrgn: dwarf_index: handle
DW_AT_specification DIEs with two passes"), I claimed that the
specification map didn't need to be sharded "because there typically
aren't enough of these in a program to cause contention". This is true
for the Linux kernel, but not for large C++ applications. Instead of
sharding, though, we can avoid synchronization entirely by having each
indexing thread build its own specification map and then merging them at
the end. This reduces the time to index one large, statically-linked C++
application from 15 seconds to 8.5 seconds! As expected, it has no
significant performance difference for the Linux kernel.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The upcoming rework of the DWARF index needs entries in the DWARF index
to be as small as possible. The first thing we can get rid of is the
struct drgn_elf_file * in struct drgn_dwarf_index_die and struct
drgn_dwarf_specification. Instead, we can sort the struct
drgn_dwarf_index_cu_vector index_cus by start address, then do a binary
search on the DIE address to find the CU and file containing it.
As a result of this change, struct drgn_dwarf_index_die no longer
contains enough information for drgn_dwarf_index_get_die() to convert it
into a libdw Dwarf_Die. But, after the last two commits,
drgn_dwarf_index_get_die() is now always called immediately after
drgn_dwarf_index_iterator_next(). So, let's get rid of
drgn_dwarf_index_get_die() and make drgn_dwarf_index_iterator_next()
return the Dwarf_Die and struct drgn_elf_file *.
We offset the cost of the binary search in index_cus by storing the
libdw Dwarf_CU * in struct drgn_dwarf_index_cu. This allows us to avoid
calling dwarf_offdie{,_types}(), which does a (slower) binary tree
search to find the Dwarf_CU * anyways.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
DWARF index iterators are used both for DIE lookups and namespace
lookups. Split the latter out into its own interface so that we can
simplify the former and support an upcoming rework of the DWARF index.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
When we encounter an incomplete struct, union, class, or enum type, we
try to find the complete definition by name. We also try to detect
whether the name is ambiguous, i.e., whether there are multiple distinct
types with that name. This is based on the DWARF index's deduplication
by filename: if the index contains more than one DIE matching the (name,
tag), then the type name was defined in more than one file, and
therefore it is ambiguous.
However, this breaks if the exact same definition came from different
paths. For example, a Linux kernel module built out-of-tree may use
different paths than the original kernel build. Other scenarios
involving the compilation directory could also affect this.
Furthermore, this check won't be feasible with an upcoming rework of the
DWARF index.
Let's drop the check and return the first match regardless of other
matches. Hopefully it doesn't matter too much in practice. If the wrong
type is returned, it can be worked around by casting to the correct type
looked up by filename.
Closes#186.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The upcoming DWARF index rework will make it too difficult to roll back
in the middle of DWARF indexing. It also doesn't make sense for the
planned module API. Let's chuck that code and instead save the error to
return forever like we do for index_namespace().
Signed-off-by: Omar Sandoval <osandov@osandov.com>
For many use cases of vectors, a full size_t isn't necessary, and might
even be unnecessary memory overhead. Allow using any unsigned integer
type no larger than size_t, but continue to default to size_t.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Add type_if() and typedef_if() to a new header, generics.h. These will
be used for the upcoming vector variants.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The current generic vector API is pretty minimal and exposes its
internal members as part of the public interface. This has worked well
but prevents us from changing the vector implementation. In particular,
I'd like to have "small vector" variants that can store some entries
directly in the vector structure, use a smaller integer type for the
size and capacity, or both.
So, let's make the generated vector type "private" and add accessor
functions. This is very verbose in some cases, but it'll grant us much
more flexibility. While we're changing every user anyways, let's also
make use of _cleanup_(vector_deinit) where possible.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
This has gotten in the way more than it has helped. I'll probably do the
same to min() and max() the next time they annoy me.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If a CU doesn't have a DW_AT_str_offsets_base attribute and the
.debug_str_offsets section is too short, then we'll try to dereference a
NULL Dwarf_Attribute pointer when reporting the error. Report that case
explicitly.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
If the call to drgn_debug_info_main_language() from
drgn_program_set_language_from_main() fails, then the latter needs to
bail, not write garbage from the stack into prog->lang, which will crash
later.
Fixes: 5591d199b1 ("libdrgn: debug_info: split DWARF support into its own file")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The makedumpfile flattened format is occasionally seen by users, but is
not read by libkdumpfile and thus unsupported by Drgn. A simple
'reassembly' process is all that is necessary to allow Drgn to open the
vmcore, but this fact isn't easily discoverable, resulting in issues
like #344. To help users, detect this when we're testing for kdump
signatures, and raise an error with reassembly instructions.
For further details on the flattened format, consult makedumpfile(8),
particularly the sections documenting options -F and -R.
Signed-off-by: Stephen Brennan <stephen@brennan.io>
This is useful for debugging the state of the program after loading
debugging information (e.g., debugging drgn with drgn!). For example:
load_debug_info --post-exec 'echo drgn -p $1; echo "prog_obj = Object(prog, \"struct drgn_program *\", $2)"; sleep +inf'
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Fix a missing error goto and print the time after the post-exec command.
Fixes: a21355eb69 ("libdrgn: examples: add --pre-exec and --post-exec options to load_debug_info")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The lack of a semicolon after these macros has always confused tooling
like cscope. We could add semicolons everywhere now, but let's enforce
it for the future, too. Let's add a dummy struct forward declaration at
the end of each macro that enforces this requirement and also provides a
useful error message.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Once again, UBSan has reported the stupid undefined behavior of memcpy()
from a NULL source (even with a zero size). In fact, I fixed it in a
previous incarnation of this code in commit a17215e984 ("libdrgn:
dwarf_index: fix memcpy() undefined behavior").
Fixes: 0e6a0a5f94 ("libdrgn: dwarf_info: get rid of struct drgn_dwarf_index_pending_cu")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
The Linux kernel's struct task_struct on AArch64 contains an array of
__uint128_t:
>>> task = find_task(prog, 1)
>>> task.type_
struct task_struct *
>>> task.thread.type_
struct thread_struct {
struct cpu_context cpu_context;
struct {
unsigned long tp_value;
unsigned long tp2_value;
struct user_fpsimd_state fpsimd_state;
} uw;
enum fp_type fp_type;
unsigned int fpsimd_cpu;
void *sve_state;
void *sme_state;
unsigned int vl[2];
unsigned int vl_onexec[2];
unsigned long fault_address;
unsigned long fault_code;
struct debug_info debug;
struct ptrauth_keys_user keys_user;
struct ptrauth_keys_kernel keys_kernel;
u64 mte_ctrl;
u64 sctlr_user;
u64 svcr;
u64 tpidr2_el0;
}
>>> task.thread.uw.fpsimd_state.type_
struct user_fpsimd_state {
__int128 unsigned vregs[32];
__u32 fpsr;
__u32 fpcr;
__u32 __reserved[2];
}
As a result, printing a task_struct fails:
>>> task
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/host/home/osandov/repos/drgn3/drgn/cli.py", line 140, in _displayhook
text = value.format_(columns=shutil.get_terminal_size((0, 0)).columns)
NotImplementedError: integer values larger than 64 bits are not yet supported
PR #311 suggested treating >64-bit integers as byte arrays for now; I
tried an alternate hack of handling >64-bit integers only in the
pretty-printing code. Both of these had issues, though.
Instead, let's push >64-bit integer support a little further and allow
storing "big integer" value objects. We still don't support any
operations on them, so this still doesn't complete #170. We store the
raw bytes of the value for now, but we'll probably change this if we add
support for operations (e.g., to store the value as an mp_limb_t array
for GMP). We also print >64-bit integer types in hexadecimal for
simplicity. This is inconsistent with the existing behavior of printing
in decimal, but more readable. In the future, we might want to add
heuristics to decide when to print in decimal vs hexadecimal for all
sizes.
Closes#311.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
We have tons of cleanup code just for calling Py_DECREF(); this is a
perfect use case for a scope guard. Add it and use it everywhere that it
is straightforward to.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Kevin Svetlitski suggested making use of __attribute__((__cleanup__)) a
long time ago, and now that the kernel is doing it, I don't have a good
excuse not to. There are surprisingly only a handful of places that it
was straightforward to apply it to. A lot of potential uses are thwarted
by our policy that out parameters can be clobbered on failure, so that
may be something to revisit. Other cleanup guards will probably be more
useful, but this is just laying the groundwork for the future.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
This is similar to commit 155ec92ef2 ("libdrgn: fix reading 32-bit
float object values on big-endian").
Fixes: 75c3679147 ("Rewrite drgn core in C")
Signed-off-by: Omar Sandoval <osandov@osandov.com>
RHEL 7 kernel still uses `struct log *` for a structured kernel log
instead of `struct printk_log *`, so lets try to support it.
Tested on:
* `3.10.0-229`
* `3.10.0-1160.80.1`
* `3.10.0-1160.83.1`
This should have no impact on existing supported cases.
Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
We've addressed all of the smaller differences with GNU Debug Fission
and split DWARF 5, so now all that remains is the DWARF index.
The general approach is: in drgn_dwarf_index_read_cus(), for each CU,
ask libdw for the "sub-DIE". For skeleton CUs, this is the split CU DIE
from the .dwo file. From that Dwarf_Die, we can get the Dwarf_CU and
then the Dwarf handle. Then, we wrap that in a struct drgn_elf_file
(cached in a hash table in the struct drgn_module), which the DWARF
index can work with from there.
Additionally, a couple of places (.debug_addr parsing and stack trace
local variable lookup) need to be updated to use the correct
drgn_elf_file.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Split DWARF is challenging for the DWARF index for a couple of reasons:
1. We need libdw to look up the split files.
2. The file name table comes from the skeleton file, but everything else
relevant to the index comes from the split file.
(1) requires the index to use libdw to get the CU DIE. Unfortunately,
due to the overhead of libdw, this makes the indexing step 5-10% slower.
On the plus side, getting the CU DIE upfront simplifies quite a bit: we
can read the file name table, compilation directory, and str_offsets
base before indexing, which makes supporting (2) possible.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
In the next change, we'll need more information about the unit, and
there's no benefit to doing it ourselves anymore.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Instead, reuse struct drgn_dwarf_index_cu for the pending CUs. This is
mainly so that we can save more information in the pending CU in a later
change. It also lets us merge our per-thread pending CU arrays with
memcpy() instead of element-by-element, but I didn't measure a
performance difference one way or the other.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
There are a couple of differences with non-split DWARF 5:
- DW_AT_addr_base/DW_AT_GNU_addr_base is in the skeleton DIE, so we need
to use dwarf_attr_integrate().
- GNU Debug Fission for DWARF 4 doesn't have headers in .debug_addr.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
dwarf_module_find_dwarf_scopes() and drgn_dwarf_die_iterator_next() just
need to go from skeleton units to split units. We need to use
dwarf_cu_info(), which was added in 0.171, which incidentally was when
elfutils gained split DWARF support anyways.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
It seems like GCC omits this for split units when using DWARF 5,
intending it to mean the first entry in .debug_loclists.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
GNU Debug Fission doesn't have DW_AT_str_offsets_base but does have
.debug_str_offsets. GCC doesn't emit DW_AT_str_offsets_base for DWARF 5
split DWARF. In both cases, the default is the first entry in
.debug_str_offsets.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Now that drgn is hooked up to log to the logging module, let's configure
the logging module to print logs nicely and add a --log-level command
line option. This makes the quiet parameter to run_interactive()
redundant, so we ignore it now and will remove it in a future release.
I'm not sure whether we should expose the log formatter, or maybe
run_interactive() should also set up the logger. I may also want to
break download progress out into a separate option from --quiet and then
make --quiet equivalent to --log-level=none --progress=never. All of
that can happen later.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Rather than coming up with our own, separate logging API for the Python
bindings, let's integrate with the logging module. The straightforward
part is creating a logger from the C extension and adding a log callback
that calls its log() method. However, syncing the log level between the
logging module and libdrgn requires monkey patching.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Exceptions aren't enough to debug complicated code paths like debug info
discovery or stack unwinding. We really need logs for that, so let's add
a small logging framework. By default, we log to stderr, but we also
provide a way to direct logs to a different file, or even an arbitrary
callback so that logs can be directed to the application's logging
library of choice.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
DWARF indexing can take a long time; Kevin Svetlitski notes that it can
take almost a minute on some large binaries. Let's use the new blocking
API around it so that the Python bindings drop the GIL.
Closes#247.
Suggested-by: Kevin Svetlitski <svetlitski@meta.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>