JakeHillion/drgn

mirror of https://github.com/JakeHillion/drgn.git synced 2024-12-23 01:33:06 +00:00

Author	SHA1	Message	Date
Omar Sandoval	c4a122ead6	libdrgn: dwarf_info: scalably index all DIEs per name We currently deduplicate entries in the DWARF index by (name, tag, file name). We want to add support for looking up nested classes, so this is a problem: not every DIE defining a class also defines all of its nested types, so the one DIE we index may not allow us to find every nested class. Instead, we need to index every DIE with a given name. This sounds horribly expensive, both in terms of CPU and memory, but we can mitigate this in several ways: - We no longer need to parse the file name table, cache file name hashes, parse DW_AT_decl_file, or store the file name hash for indexed DIEs. - Instead of storing the tag for each indexed DIE, we can split the DIE map into a map per tag. - We can store the DIEs matching a name in a vector instead of a linked list. - We can use the new inline entry and small size variants of vectors. - We can move struct drgn_namespace_dwarf_index * to a tree separate from the indexed DIEs. After all of these changes, we only need a single uintptr_t per indexed DIE. - We can get rid of the struct drgn_dwarf_index_pending_die list for a namespace and use the indexed DIEs instead, which are half the size. - DW_TAG_base_type maps can be assumed to be globally unique, so they can be stored in their own map of one DIE indexed only by name. - Each thread can independently build the DIE maps without any synchronization to be merged at the end. Here are some performance results comparing the New version (this commit) to the Old version (commit `16164dbe6e` ("libdrgn: detect flattened vmcores and raise error")). Application is either a large, statically-linked C++ application or the live Linux kernel. Threads is the OMP_NUM_THREADS setting used (the machine used for testing has 80 CPUs). Time is the amount of time it took to load and index debugging information. Anon is the amount of anonymous (e.g., heap) memory used. File is the amount of file memory used. Large C++ Application \| Threads \| Version \| Time \| Anon \| File ------------+---------+---------+--------+--------+------- Large C++ \| 80 \| New \| 5 s \| 3.5 GB \| 1.4 GB \| \| Old \| 15 s \| 5.2 GB \| 1.7 GB \| 8 \| New \| 6.5 s \| 3.4 GB \| 1.4 GB \| \| Old \| 10 s \| 5.2 GB \| 1.7 GB \| 1 \| New \| 30 s \| 3.4 GB \| 1.4 GB \| \| Old \| 51 s \| 5.2 GB \| 1.7 GB Linux \| 80 \| New \| 270 ms \| 128 MB \| 300 MB \| \| Old \| 380 ms \| 73 MB \| 326 MB \| 8 \| New \| 240 ms \| 115 MB \| 300 MB \| \| Old \| 240 ms \| 73 MB \| 326 MB \| 1 \| New \| 700 ms \| 87 MB \| 300 MB \| \| Old \| 800 ms \| 73 MB \| 326 MB The results show that the new approach is almost always faster. For the large C++ application, it is much better for both time and memory usage. For the Linux kernel, it is slightly faster and uses more anonymous memory, although that is partially offset by less file memory. (For the Linux kernel, there is a dip in performance for both approaches from 8 threads to 80 which is worth looking into later.) Signed-off-by: Omar Sandoval <osandov@osandov.com>	2023-08-16 14:15:03 -07:00
Omar Sandoval	4d85f4a7cc	libdrgn: dwarf_info: index specifications per thread In commit `26291647eb` ("libdrgn: dwarf_index: handle DW_AT_specification DIEs with two passes"), I claimed that the specification map didn't need to be sharded "because there typically aren't enough of these in a program to cause contention". This is true for the Linux kernel, but not for large C++ applications. Instead of sharding, though, we can avoid synchronization entirely by having each indexing thread build its own specification map and then merging them at the end. This reduces the time to index one large, statically-linked C++ application from 15 seconds to 8.5 seconds! As expected, it has no significant performance difference for the Linux kernel. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2023-08-16 14:15:03 -07:00
Omar Sandoval	2ad157a795	libdrgn: dwarf_info: don't store file in DWARF index entries The upcoming rework of the DWARF index needs entries in the DWARF index to be as small as possible. The first thing we can get rid of is the struct drgn_elf_file * in struct drgn_dwarf_index_die and struct drgn_dwarf_specification. Instead, we can sort the struct drgn_dwarf_index_cu_vector index_cus by start address, then do a binary search on the DIE address to find the CU and file containing it. As a result of this change, struct drgn_dwarf_index_die no longer contains enough information for drgn_dwarf_index_get_die() to convert it into a libdw Dwarf_Die. But, after the last two commits, drgn_dwarf_index_get_die() is now always called immediately after drgn_dwarf_index_iterator_next(). So, let's get rid of drgn_dwarf_index_get_die() and make drgn_dwarf_index_iterator_next() return the Dwarf_Die and struct drgn_elf_file . We offset the cost of the binary search in index_cus by storing the libdw Dwarf_CU in struct drgn_dwarf_index_cu. This allows us to avoid calling dwarf_offdie{,_types}(), which does a (slower) binary tree search to find the Dwarf_CU * anyways. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2023-08-16 14:15:03 -07:00
Omar Sandoval	919e95e2d5	libdrgn: dwarf_info: make drgn_dwarf_index_state::max_threads an int It doesn't really matter, but the return type of omp_get_max_threads() is int. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2023-08-16 14:15:03 -07:00
Omar Sandoval	c8406e1ea0	libdrgn: require semicolon after DEFINE_{HASH,VECTOR,BINARY_SEARCH_TREE}* The lack of a semicolon after these macros has always confused tooling like cscope. We could add semicolons everywhere now, but let's enforce it for the future, too. Let's add a dummy struct forward declaration at the end of each macro that enforces this requirement and also provides a useful error message. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2023-08-02 14:54:59 -07:00
Omar Sandoval	0e6a0a5f94	libdrgn: dwarf_info: get rid of struct drgn_dwarf_index_pending_cu Instead, reuse struct drgn_dwarf_index_cu for the pending CUs. This is mainly so that we can save more information in the pending CU in a later change. It also lets us merge our per-thread pending CU arrays with memcpy() instead of element-by-element, but I didn't measure a performance difference one way or the other. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2023-07-19 10:10:08 -07:00
Omar Sandoval	fc3ea4184a	libdrgn: use new include-what-you-use exported declarations and fix warnings include-what-you-use/include-what-you-use#1164 fixed include-what-you-use/include-what-you-use#971 so that we can export forward declarations instead of hacking around it. I can't reproduce the issue with BINARY_OP_SIGNED_2C anymore either, so we can remove that hack, too. Also fix any other warnings. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2023-05-24 00:25:25 -07:00
Omar Sandoval	18b12a5c7b	libdrgn: get .eh_frame from the correct file We're currently getting .eh_frame from the debug file. However, since .eh_frame is an SHF_ALLOC section, it is actually in the loaded file, and may not be in the debug file. This causes us to fail to unwind in modules whose debug file was created with objcopy --only-keep-debug (which is typical for Linux distro debug files). Fix it by getting .eh_frame from the loaded file. To make this easier, we split .eh_frame and .debug_frame data into two separate tables. We also don't bother deduplicating them anymore, since GCC and Clang only seem to generate one or the other in practice. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-28 13:37:29 -08:00
Omar Sandoval	34f122144a	libdrgn: debug_info: wrap ELF file information in new struct drgn_elf_file struct drgn_module contains a bunch of information about the debug info file. Let's pull it out into its own structure, struct drgn_elf_file. This will be reused for the "main"/"loaded" file in an upcoming change. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-28 13:37:29 -08:00
Stephen Brennan	5f3a91f80d	Add StackFrame.locals() method The StackFrame's __getitem__() method allows looking up names in the scope of a stack frame, which is an incredibly useful tool for debugging. However, the names are not discoverable -- you must already be looking at the source code or some other source to know what names can be queried. To fix this, add a locals() method to StackFrame, which lists names that can be queried in the scope. Since this method is named locals(), it stops at the function scope and doesn't include globals or class members. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>	2022-11-02 22:40:33 -07:00
Omar Sandoval	87b7292aa5	Relicense drgn from GPLv3+ to LGPLv2.1+ drgn is currently licensed as GPLv3+. Part of the long term vision for drgn is that other projects can use it as a library providing programmatic interfaces for debugger functionality. A more permissive license is better suited to this goal. We decided on LGPLv2.1+ as a good balance between software freedom and permissiveness. All contributors not employed by Meta were contacted via email and consented to the license change. The only exception was the author of commit `c4fbf7e589` ("libdrgn: fix for compilation error"), who did not respond. That commit reverted a single line of code to one originally written by me in commit `640b1c011d` ("libdrgn: embed DWARF index in DWARF info cache"). Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-11-01 17:05:16 -07:00
Omar Sandoval	70af25849c	libdrgn: rename drgn_debug_info_module to drgn_module Eventually, modules will be exposed as part of the public libdrgn API, so they should have a clean name. Additionally, the module API I'm currently working on will allow modules for which we don't have the debug info file, so "debug info module" would be a misnomer. Also rename drgn_dwarf_module_info to drgn_module_dwarf_info and drgn_orc_module_info to drgn_module_orc_info to fit the new naming scheme better. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2022-10-05 16:52:46 -07:00
Omar Sandoval	c0d8709b45	Update copyright headers to Meta Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-21 15:59:44 -08:00
Omar Sandoval	c3f31e28f9	libdrgn: reorganize and move DWARF index into dwarf_info.c The upcoming introduction of a higher level data structure to represent a namespace has implications on the organization of the DWARF index and debug info management code. Basically, we're going to want to track what is currently known as struct drgn_dwarf_index_namespace as part of the new struct drgn_namespace. That only leaves the DWARF specification map and list of CUs in struct drgn_dwarf_index, which doesn't make much sense anymore. Instead, let's: * Move the specification map and CUs into struct drgn_dwarf_info. * Rename struct drgn_dwarf_index_namespace to struct drgn_namespace_dwarf_index to indicate that it is the "DWARF index for a namespace" rather than a "namespace of a DWARF index". * Move the DWARF index implementation into dwarf_info.c. The DWARF index and debugging information management have always been coupled, so this makes it more explicit and is more convenient. * Improve documentation and naming in the DWARF index implementation. Now, the only DWARF-specific code outside of dwarf_info.c is for stack tracing, but we'll leave that for another day. Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-18 15:08:55 -08:00
Omar Sandoval	5591d199b1	libdrgn: debug_info: split DWARF support into its own file Continuing the refactoring from the previous commit, move the DWARF code from debug_info.c to its own file, leaving only the generic ELF file management in debug_info.c Signed-off-by: Omar Sandoval <osandov@osandov.com>	2021-11-18 15:08:54 -08:00

15 Commits