Commit Graph

974 Commits

Author SHA1 Message Date
Omar Sandoval
fa081e32b9 libdrgn: update module section iterator for Linux v5.8
Linux v5.8 changed the module section structure, so we need to get the
section name differently.

Closes #73.

Reported-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-13 13:07:36 -07:00
Omar Sandoval
1c6465f0b0 libdrgn: fix infinite loop on error caching kernel module sections
If cache_kernel_module_sections() in report_loaded_kernel_module()
fails, we continue to the next iteration without advancing to the next
kernel module. Then, we fail on that same kernel module and repeat. Make
sure that we go to the next kernel module.

Fixes: 423d2cd500 ("libdrgn: dwarf_index: rework file reporting")
Reported-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-13 12:25:22 -07:00
Omar Sandoval
431b91ddb5 libdrgn: fix use-after-free in kernel module reporting error case
We're freeing path and then using it to report an error.

This has some weird knock-on effects. Since we freed the path, the error
message contains garbage. So, PyErr_SetString() can't decode it as a
UTF-8 string. The end result is a MissingDebugInfoError with no message.

Fix it by creating the error before freeing the path.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-13 11:55:06 -07:00
Omar Sandoval
2b325b9262 libdrgn: add an environment variable to disable use of /proc/modules and /sys/module
We use /proc/modules and /sys/module to find loaded kernel modules for
the running kernel instead of walking the module list in the core dump
as an optimization. To make it easier to test the core dump path, add an
environment variable to disable the optimization.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-13 11:24:39 -07:00
Omar Sandoval
661a5c56c3 libdrgn: refactor kernel module iterator
The next commit will allow using the offline path for the live kernel,
so the offline naming won't make much sense. Fold the offline path into
the top-level functions, and make the live path an escape hatch. Also
add some comments and improve naming for the file and directory handles
and update the coding style.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-12 23:22:01 -07:00
Omar Sandoval
ce8540e39c libdrgn: get rid of kernel_module_iterator::notes*
These were added in commit e5874ad18a ("libdrgn: use libdwfl"), but
they have never been used. Remove them.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-12 17:23:31 -07:00
Omar Sandoval
3c5d22637e libdrgn: clean up hash function APIs and improve documentation
Use *_hash_pair() for hash functions that do the full double hashing and
return a struct hash_pair and hash_*() for other hashing utility
functions. Also change some of the equality function names to be more
symmetric and improve the documentation.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-12 16:20:08 -07:00
Omar Sandoval
761da83ddd libdrgn: add {min,max}_iconst() and rewrite min() and max()
min() and max() from the Linux kernel go through the trouble of
resulting in a constant expression if the arguments are constant
expressions, but they can't be used outside of a function due to their
use of ({ }). This means that they can't be used for, e.g., enumerators
or global arrays. Let's simplify min() and max() and instead add
explicit min_iconst() and max_iconst() macros that can be used
everywhere that an integer constant expression is required. We can then
use it in hash_table.h. While we're here, let's split these into their
own header file and document them better.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-10 23:48:03 -07:00
Omar Sandoval
fa44171ba1 libdrgn: split bit operations into their own header
And improve their documentation.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-09 17:44:15 -07:00
Omar Sandoval
cae79d2676 libdrgn: add preprocessor utility macros
These will be used in upcoming changes.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-09 16:36:59 -07:00
Omar Sandoval
4cbb9b552a libdrgn: fix comparison of types with anonymous members
drgn_type_members_eq() skips comparing the types of anonymous members.
Fix that and add a test for it.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-10-08 17:32:46 -07:00
Omar Sandoval
de6a4e07ae libdrgn: fix Doxygen
The Doxygen documentation for libdrgn has bit-rotted over time. Bring
back the Internal module, clean up a few renamed members and parameters,
and fix broken parsing caused by the generic definition macros.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-30 01:32:33 -07:00
Omar Sandoval
2704fd17c3 libdrgn: update Doxyfile
doxygen warns about a few obsolete Doxyfile options. Update it with
doxygen -u.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-29 23:04:08 -07:00
Omar Sandoval
d829401e5f vmtest: also disable onoatimehack on QEMU 5.0.1
The fix was backported to QEMU's 5.0 stable branch and released in
5.0.1.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-23 16:42:15 -07:00
Omar Sandoval
286c09844e Clean up #includes with include-what-you-use
I recently hit a couple of CI failures caused by relying on transitive
includes that weren't always present. include-what-you-use is a
Clang-based tool that helps with this. It's a bit finicky and noisy, so
this adds scripts/iwyu.py to make running it more convenient (but not
reliable enough to automate it in Travis).

This cleans up all reasonable include-what-you-use warnings and
reorganizes a few header files.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-23 16:29:42 -07:00
Omar Sandoval
fdbe336386 libdrgn: use -isystem for elfutils headers
The elfutils header files should be treated as if they were in the
standard location, so use -isystem instead of -I.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 15:45:10 -07:00
Omar Sandoval
89b5da2abb libdrgn: dwarf_index: free namespaces when rolling back
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 10:58:24 -07:00
Omar Sandoval
e69d0c0064 libdrgn: dwarf_index: fix use after free of pending CU
If we create a pending CU for a namespace, then add more CUs to the
index, the CU might get reallocated, resulting in a use after free. Fix
it by storing the index of the CU instead of the pointer.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 10:58:24 -07:00
Omar Sandoval
f83bb7c71b libdrgn: move debugging information tracking into drgn_debug_info
Debugging information tracking is currently in two places: drgn_program
finds debugging information, and drgn_dwarf_index stores it. Both of
these responsibilities make more sense as part of drgn_debug_info, so
let's move them there. This prepares us to track extra debugging
information that isn't pertinent to indexing.

This also reworks a couple of details of loading debugging information:

- drgn_dwarf_module and drgn_dwfl_module_userdata are consolidated into
  a single structure, drgn_debug_info_module.
- The first pass of DWARF indexing now happens in parallel with reading
  compilation units (by using OpenMP tasks).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-22 10:58:24 -07:00
Omar Sandoval
3ac9ae357b libdrgn: rename drgn_dwarf_info_cache to drgn_debug_info
The current name is too verbose. Let's go with a shorter, more generic
name.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-11 17:41:23 -07:00
Jay Kamat
d1beb0184a libdrgn: add support for objects in C++ namespaces
DWARF represents namespaces with DW_TAG_namespace DIEs. Add these to the
DWARF index, with each namespace being its own sub-index. We only index
the namespace itself when it is first accessed, which should help with
startup time and simplifies tracking.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2020-09-02 17:13:16 -07:00
Jay Kamat
a51abfcd70 libdrgn: dwarf_index: keep CUs after indexing
In order to index namespaces lazily, we need the CU structures. Rename
struct compilation_unit to the less generic struct drgn_dwarf_index_cu
and keep the CUs in a vector in the dindex.

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
66ad5077c9 libdrgn: dwarf_index: return indexed DIE entry from drgn_dwarf_index_iterator_next()
For namespace support, we will want to access the struct
drgn_dwarf_index_die for namespaces instead of the Dwarf_Die. Split
drgn_dwarf_index_get_die() out of drgn_dwarf_index_iterator_next().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
d512964c1e libdrgn: add drgn_error_copy()
This is needed for a future change where we'll want to save an error and
return it multiple times.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
7a85b4188e libdrgn: clean up read.h helpers and avoid undefined pointer behavior
There are a couple of related ways that we can cause undefined behavior
when parsing a malformed DWARF or depmod index file:

1. There are several places where we increment the cursor to skip past
   some data. It is undefined behavior if the result points out of
   bounds of the data, even if we don't attempt to dereference it.
2. read_in_bounds() checks that ptr <= end. This pointer comparison is
   only defined if ptr and end both point to elements of the same array
   object or one past the last element. If ptr has gone past end, then
   this comparison is likely undefined anyways.

Fix it by adding a helper to skip past data with bounds checking. Then,
all of the helpers can assume that ptr <= end and maintain that
invariant. while we're here and auditing all of the call sites, let's
clean up the API and rename it from read_foo() to the less generic
mread_foo().

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
c053c2b212 libdrgn: dwarf_index: handle DW_AT_specification with DW_FORM_ref_addr
Now that we can handle a DW_AT_specification that references another
compilation unit, add support for DW_FORM_ref_addr.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
26291647eb libdrgn: dwarf_index: handle DW_AT_specification DIEs with two passes
We currently handle DIEs with a DW_AT_specification attribute by parsing
the corresponding declaration to get the name and inserting the DIE as
usual. This has a couple of problems:

1. It only works if DW_AT_specification refers to the same compilation
   unit, which is true for DW_FORM_ref{1,2,4,8,_udata}, but not
   DW_FORM_ref_addr. As a result, drgn doesn't support the latter.
2. It assumes that the DIE with DW_AT_specification is in the correct
   "scope". Unfortunately, this is not true for g++: for a variable
   definition in a C++ namespace, it generates a DIE with
   DW_AT_declaration as a child of the DW_TAG_namespace DIE and a DIE
   which refers to the declaration with DW_AT_specification _outside_ of
   the DW_TAG_namespace as a child of the DW_TAG_compilation_unit DIE.

Supporting both of these cases requires reworking how we handle
DW_AT_specification. This commit takes an approach of parsing the DWARF
data in two passes: the first pass reads the abbrevation and file name
tables and builds a map of instances of DW_AT_specification; the second
pass indexes DIEs as before, but ignores DIEs with DW_AT_specification
and handles DIEs with DW_AT_declaration by looking them up in the map
built by the first pass.

This approach is a 10-20% regression in indexing time in the benchmarks
I ran. Thankfully, it is not 100% slower for a couple of reasons. The
first is that the two passes are simpler than the original combined
pass. The second is that a decent part of the indexing time is spent
faulting in the mapped debugging information, which only needs to happen
once (even if the file is cached, minor page faults add non-negligible
overhead).

This doesn't handle DW_AT_specification "chains" yet, but neither did
the original code. If it is necessary, it shouldn't be too difficult to
add.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
507977664c libdrgn: dwarf_index: store abbrevation and file name tables in CU
This is preparation for the next change where we'll need to do two
passes over the CUs.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
0b4ab1772b libdrgn: dwarf_index: store DIE indices as uint32_t
It's very unlikely that we'll ever index more than 4 billion DIEs in a
single shard, so we can shrink the index a bit by using uint32_t
indices (and uint8_t tag).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
9ce9094ee0 libdrgn: dwarf_index: don't copy sections into each CU
I originally copied the sections into each compilation unit to avoid a
pointer indirection, but performance-wise it's a wash, so we might as
well save the memory. This will be more important when we keep the CUs
after indexing.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
94e7b1f92c libdrgn: dwarf_index: avoid copying CUs for one thread
In read_cus(), the master thread can use the final CUs vector directly
and the rest of the threads can merge their private vectors in. This
consistently shaves a few milliseconds off of startup.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
53ba7262cd libdrgn: dwarf_index: handle DW_AT_declaration with DW_FORM_flag
We currently assume that if DW_AT_declaration is present, it is true.
This seems to be true in practice, and I see no reason to ever use
DW_FORM_flag with a value of zero. There's no performance hit to handle
it, though, so we might as well.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
ea9f3f3114 libdrgn: dwarf_index: don't worry about tag of CU DIE
As a small simplification, we can take commit 9bb2ccecb7 ("Enable
DWARF indexing to work with partial units") further and not look at the
tag of the top-level DIE at all.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
c8f84c57fb libdrgn: dwarf_index: use size_t instead of uint64_t where appropriate
The CU unit length and DIE offset are both limited by the size of the
mapped debugging information, i.e., size_t.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
2252bef1a7 libdrgn: dwarf_index: rename TAG_FLAG_* and TAG_MASK to DIE_FLAG_*
This is more clear: although these flags happen to be encoded with the
DWARF tag, they are flags regarding the DIE.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
85c4b36820 libdrgn: dwarf_index: fix leak when parsing bad line number program header
If we fail to read an include directory in read_file_name_table(), we
need to free the directory hashes.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-09-02 17:13:16 -07:00
Omar Sandoval
d4b1711128 setup.py: add 5.9 to vmtest kernels
Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 16:29:03 -07:00
Omar Sandoval
5ce80016c5 docs: use a different example for execscript()
We have a real task_state_to_char() helper now, so we shouldn't use a
half-baked implementation of it as an example.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 14:16:19 -07:00
Omar Sandoval
ff96c75da0 helpers: translate task_state_to_char() to Python
Commit 326107f054 ("libdrgn: add task_state_to_char() helper")
implemented task_state_to_char() in libdrgn so that it could be used in
commit 4780c7a266 ("libdrgn: stack_trace: prohibit unwinding stack of
running tasks"). As of commit eea5422546 ("libdrgn: make Linux kernel
stack unwinding more robust"), it is no longer used in libdrgn, so we
can translate it to Python. This removes a bunch of code and is more
useful as an example.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 13:54:39 -07:00
Omar Sandoval
36068a0ea8 Fix trailing commas for Black v20.8b1
Black was recently changed to treat a trailing comma as an indicator to
put each item/argument on its own line. We have a bunch of places where
something previously had to be split into multiple lines, then was
edited to fit on one line, but Black kept the trailing comma. Now this
update wants to unnecessarily split it back up. For now, let's get rid
of these commas. Hopefully in the future Black has a way to opt out of
this.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
e96d9fd3fd libdrgn/python: don't allow None for Program.object() flags
Similar to the previous commit, this was to work around pydoc issues
that we don't have anymore.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
2fc514f2a4 libdrgn/python: add Qualifiers.NONE and stop using Optional[Qualifiers]
I originally did it this way because pydoc doesn't handle non-trivial
defaults in signature very well (see commit 67a16a09b8 ("tests: test
that Python documentation renders")). drgndoc doesn't generate signature
for pydoc anymore, though, so we don't need to worry about it and can
clean up the typing.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:29 -07:00
Omar Sandoval
e49a87a3d7 libdrgn: remove struct drgn_object::prog
We can get it via the type now.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-27 11:31:21 -07:00
Omar Sandoval
a97f6c4fa2 Associate types with program
I originally envisioned types as dumb descriptors. This mostly works for
C because in C, types are fairly simple. However, even then the
drgn_program_member_info() API is awkward. You should be able to look up
a member directly from a type, but we need the program for caching
purposes. This has also held me back from adding offsetof() or
has_member() APIs.

Things get even messier with C++. C++ template parameters can be objects
(e.g., template <int N>). Such parameters would best be represented by a
drgn object, which we need a drgn program for. Static members are a
similar case.

So, let's reimagine types as being owned by a program. This has a few
parts:

1. In libdrgn, simple types are now created by factory functions,
   drgn_foo_type_create().
2. To handle their variable length fields, compound types, enum types,
   and function types are constructed with a "builder" API.
3. Simple types are deduplicated.
4. The Python type factory functions are replaced by methods of the
   Program class.
5. While we're changing the API, the parameters to pointer_type() and
   array_type() are reordered to be more logical (and to allow
   pointer_type() to take a default size of None for the program's
   default pointer size).
6. Likewise, the type factory methods take qualifiers as a keyword
   argument only.

A big part of this change is updating the tests and splitting up large
test cases into smaller ones in a few places.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:41:09 -07:00
Omar Sandoval
c31208f69c libdrgn: fold drgn_type_index into drgn_program
This is preparation for associating types with a program.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:36:35 -07:00
Omar Sandoval
1c8181e22d libdrgn: rearrange struct drgn_program members
struct drgn_program has a bunch of state scattered around. Group it
together more logically, even if it means sacrificing some padding here
and there.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 17:34:44 -07:00
Omar Sandoval
d4e0771f87 libdrgn: return error from drgn_program_{is_little_endian,bswap,is_64_bit}()
Most places that call these check has_platform and return an error, and
those that don't can live with the extra check.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 16:56:28 -07:00
Omar Sandoval
a8d632b4c1 libdrgn/python: use F14 instead of PyDict for Program::objects
Program::objects is used to store references to objects that must stay
alive while the Program is alive. It is currently a PyDict where the
keys are the object addresses as PyLong and the values are the objects
themselves. This has two problems:

1. Allocating the key as a full object is obviously wasteful.
2. PyDict doesn't have an API for reserving capacity ahead of time,
   which we want for an upcoming change.

Both of these are easily fixed by using our own hash table.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 16:56:28 -07:00
Omar Sandoval
b0f9403ebf drgndoc: directly use name passed as argument to drgndoc directive
E.g., drgndoc:: foo.bar() should emit py:method:: foo.bar() regardless
of a previous py:module directive.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 16:56:28 -07:00
Omar Sandoval
93e33513da drgndoc: bring back :exclude:
It's still useful to have an escape hatch for names we don't want
documented.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-26 16:36:55 -07:00