Commit Graph

15 Commits

Author SHA1 Message Date
Omar Sandoval
c4a122ead6 libdrgn: dwarf_info: scalably index all DIEs per name
We currently deduplicate entries in the DWARF index by (name, tag, file
name). We want to add support for looking up nested classes, so this is
a problem: not every DIE defining a class also defines all of its nested
types, so the one DIE we index may not allow us to find every nested
class. Instead, we need to index every DIE with a given name.

This sounds horribly expensive, both in terms of CPU and memory, but we
can mitigate this in several ways:

- We no longer need to parse the file name table, cache file name
  hashes, parse DW_AT_decl_file, or store the file name hash for indexed
  DIEs.
- Instead of storing the tag for each indexed DIE, we can split the DIE
  map into a map per tag.
- We can store the DIEs matching a name in a vector instead of a linked
  list.
- We can use the new inline entry and small size variants of vectors.
- We can move struct drgn_namespace_dwarf_index * to a tree separate
  from the indexed DIEs. After all of these changes, we only need a
  single uintptr_t per indexed DIE.
- We can get rid of the struct drgn_dwarf_index_pending_die list for a
  namespace and use the indexed DIEs instead, which are half the size.
- DW_TAG_base_type maps can be assumed to be globally unique, so they
  can be stored in their own map of one DIE indexed only by name.
- Each thread can independently build the DIE maps without any
  synchronization to be merged at the end.

Here are some performance results comparing the New version (this
commit) to the Old version (commit 16164dbe6e ("libdrgn: detect
flattened vmcores and raise error")). Application is either a large,
statically-linked C++ application or the live Linux kernel. Threads is
the OMP_NUM_THREADS setting used (the machine used for testing has 80
CPUs). Time is the amount of time it took to load and index debugging
information. Anon is the amount of anonymous (e.g., heap) memory used.
File is the amount of file memory used. Large C++

Application | Threads | Version | Time   | Anon   | File
------------+---------+---------+--------+--------+-------
Large C++   | 80      | New     | 5 s    | 3.5 GB | 1.4 GB
            |         | Old     | 15 s   | 5.2 GB | 1.7 GB
            | 8       | New     | 6.5 s  | 3.4 GB | 1.4 GB
            |         | Old     | 10 s   | 5.2 GB | 1.7 GB
            | 1       | New     | 30 s   | 3.4 GB | 1.4 GB
            |         | Old     | 51 s   | 5.2 GB | 1.7 GB
Linux       | 80      | New     | 270 ms | 128 MB | 300 MB
            |         | Old     | 380 ms | 73 MB  | 326 MB
            | 8       | New     | 240 ms | 115 MB | 300 MB
            |         | Old     | 240 ms | 73 MB  | 326 MB
            | 1       | New     | 700 ms | 87 MB  | 300 MB
            |         | Old     | 800 ms | 73 MB  | 326 MB

The results show that the new approach is almost always faster. For the
large C++ application, it is much better for both time and memory usage.
For the Linux kernel, it is slightly faster and uses more anonymous
memory, although that is partially offset by less file memory. (For the
Linux kernel, there is a dip in performance for both approaches from 8
threads to 80 which is worth looking into later.)

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-16 14:15:03 -07:00
Omar Sandoval
4d85f4a7cc libdrgn: dwarf_info: index specifications per thread
In commit 26291647eb ("libdrgn: dwarf_index: handle
DW_AT_specification DIEs with two passes"), I claimed that the
specification map didn't need to be sharded "because there typically
aren't enough of these in a program to cause contention". This is true
for the Linux kernel, but not for large C++ applications. Instead of
sharding, though, we can avoid synchronization entirely by having each
indexing thread build its own specification map and then merging them at
the end. This reduces the time to index one large, statically-linked C++
application from 15 seconds to 8.5 seconds! As expected, it has no
significant performance difference for the Linux kernel.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-16 14:15:03 -07:00
Omar Sandoval
2ad157a795 libdrgn: dwarf_info: don't store file in DWARF index entries
The upcoming rework of the DWARF index needs entries in the DWARF index
to be as small as possible. The first thing we can get rid of is the
struct drgn_elf_file * in struct drgn_dwarf_index_die and struct
drgn_dwarf_specification. Instead, we can sort the struct
drgn_dwarf_index_cu_vector index_cus by start address, then do a binary
search on the DIE address to find the CU and file containing it.

As a result of this change, struct drgn_dwarf_index_die no longer
contains enough information for drgn_dwarf_index_get_die() to convert it
into a libdw Dwarf_Die. But, after the last two commits,
drgn_dwarf_index_get_die() is now always called immediately after
drgn_dwarf_index_iterator_next(). So, let's get rid of
drgn_dwarf_index_get_die() and make drgn_dwarf_index_iterator_next()
return the Dwarf_Die and struct drgn_elf_file *.

We offset the cost of the binary search in index_cus by storing the
libdw Dwarf_CU * in struct drgn_dwarf_index_cu. This allows us to avoid
calling dwarf_offdie{,_types}(), which does a (slower) binary tree
search to find the Dwarf_CU * anyways.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-16 14:15:03 -07:00
Omar Sandoval
919e95e2d5 libdrgn: dwarf_info: make drgn_dwarf_index_state::max_threads an int
It doesn't really matter, but the return type of omp_get_max_threads() is int.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-16 14:15:03 -07:00
Omar Sandoval
c8406e1ea0 libdrgn: require semicolon after DEFINE_{HASH,VECTOR,BINARY_SEARCH_TREE}*
The lack of a semicolon after these macros has always confused tooling
like cscope. We could add semicolons everywhere now, but let's enforce
it for the future, too. Let's add a dummy struct forward declaration at
the end of each macro that enforces this requirement and also provides a
useful error message.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-08-02 14:54:59 -07:00
Omar Sandoval
0e6a0a5f94 libdrgn: dwarf_info: get rid of struct drgn_dwarf_index_pending_cu
Instead, reuse struct drgn_dwarf_index_cu for the pending CUs. This is
mainly so that we can save more information in the pending CU in a later
change. It also lets us merge our per-thread pending CU arrays with
memcpy() instead of element-by-element, but I didn't measure a
performance difference one way or the other.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-07-19 10:10:08 -07:00
Omar Sandoval
fc3ea4184a libdrgn: use new include-what-you-use exported declarations and fix warnings
include-what-you-use/include-what-you-use#1164 fixed
include-what-you-use/include-what-you-use#971 so that we can export
forward declarations instead of hacking around it. I can't reproduce the
issue with BINARY_OP_SIGNED_2C anymore either, so we can remove that
hack, too. Also fix any other warnings.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2023-05-24 00:25:25 -07:00
Omar Sandoval
18b12a5c7b libdrgn: get .eh_frame from the correct file
We're currently getting .eh_frame from the debug file. However, since
.eh_frame is an SHF_ALLOC section, it is actually in the loaded file,
and may not be in the debug file. This causes us to fail to unwind in
modules whose debug file was created with objcopy --only-keep-debug
(which is typical for Linux distro debug files).

Fix it by getting .eh_frame from the loaded file. To make this easier,
we split .eh_frame and .debug_frame data into two separate tables. We
also don't bother deduplicating them anymore, since GCC and Clang only
seem to generate one or the other in practice.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-28 13:37:29 -08:00
Omar Sandoval
34f122144a libdrgn: debug_info: wrap ELF file information in new struct drgn_elf_file
struct drgn_module contains a bunch of information about the debug info
file. Let's pull it out into its own structure, struct drgn_elf_file.
This will be reused for the "main"/"loaded" file in an upcoming change.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-28 13:37:29 -08:00
Stephen Brennan
5f3a91f80d Add StackFrame.locals() method
The StackFrame's __getitem__() method allows looking up names in the
scope of a stack frame, which is an incredibly useful tool for
debugging. However, the names are not discoverable -- you must already
be looking at the source code or some other source to know what names
can be queried. To fix this, add a locals() method to StackFrame, which
lists names that can be queried in the scope. Since this method is named
locals(), it stops at the function scope and doesn't include globals or
class members.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
2022-11-02 22:40:33 -07:00
Omar Sandoval
87b7292aa5 Relicense drgn from GPLv3+ to LGPLv2.1+
drgn is currently licensed as GPLv3+. Part of the long term vision for
drgn is that other projects can use it as a library providing
programmatic interfaces for debugger functionality. A more permissive
license is better suited to this goal. We decided on LGPLv2.1+ as a good
balance between software freedom and permissiveness.

All contributors not employed by Meta were contacted via email and
consented to the license change. The only exception was the author of
commit c4fbf7e589 ("libdrgn: fix for compilation error"), who did not
respond. That commit reverted a single line of code to one originally
written by me in commit 640b1c011d ("libdrgn: embed DWARF index in
DWARF info cache").

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-11-01 17:05:16 -07:00
Omar Sandoval
70af25849c libdrgn: rename drgn_debug_info_module to drgn_module
Eventually, modules will be exposed as part of the public libdrgn API,
so they should have a clean name. Additionally, the module API I'm
currently working on will allow modules for which we don't have the
debug info file, so "debug info module" would be a misnomer.

Also rename drgn_dwarf_module_info to drgn_module_dwarf_info and
drgn_orc_module_info to drgn_module_orc_info to fit the new naming
scheme better.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2022-10-05 16:52:46 -07:00
Omar Sandoval
c0d8709b45 Update copyright headers to Meta
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-21 15:59:44 -08:00
Omar Sandoval
c3f31e28f9 libdrgn: reorganize and move DWARF index into dwarf_info.c
The upcoming introduction of a higher level data structure to represent
a namespace has implications on the organization of the DWARF index and
debug info management code. Basically, we're going to want to track what
is currently known as struct drgn_dwarf_index_namespace as part of the
new struct drgn_namespace. That only leaves the DWARF specification map
and list of CUs in struct drgn_dwarf_index, which doesn't make much
sense anymore. Instead, let's:

* Move the specification map and CUs into struct drgn_dwarf_info.
* Rename struct drgn_dwarf_index_namespace to struct
  drgn_namespace_dwarf_index to indicate that it is the "DWARF index for
  a namespace" rather than a "namespace of a DWARF index".
* Move the DWARF index implementation into dwarf_info.c. The DWARF index
  and debugging information management have always been coupled, so this
  makes it more explicit and is more convenient.
* Improve documentation and naming in the DWARF index implementation.

Now, the only DWARF-specific code outside of dwarf_info.c is for stack
tracing, but we'll leave that for another day.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-18 15:08:55 -08:00
Omar Sandoval
5591d199b1 libdrgn: debug_info: split DWARF support into its own file
Continuing the refactoring from the previous commit, move the DWARF code
from debug_info.c to its own file, leaving only the generic ELF file
management in debug_info.c

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2021-11-18 15:08:54 -08:00