Commit Graph

363 Commits

Author SHA1 Message Date
Omar Sandoval
68f7b87d6a libdrgn: ignore physical core dump segments with address -1
/proc/kcore contains segments which don't have a valid physical address,
which it indicates with a p_paddr of -1. Skip those segments, otherwise
we got an overflow error from the memory reader.
2019-05-26 14:49:48 -07:00
Omar Sandoval
c0bc72b0ea libdrgn: use splay tree for memory reader
The current array-based memory reader has a bug in the following
scenario:

    prog.add_memory_segment(0xffff0000, 128, ...)
    # This should replace a subset of the first segment.
    prog.add_memory_segment(0xffff0020, 32, ...)
    # This moves the first segment back to the front of the array.
    prog.read(0xffff0000, 32)
    # This finds the first segment instead of the second segment.
    prog.read(0xffff0032, 32)

Fix it by using the newly-added splay tree. This also splits up the
virtual and physical memory segments into separate trees.
2019-05-24 17:48:08 -07:00
Omar Sandoval
10fb398338 libdrgn: add splay tree implementation
This will be used to track memory segments instead of the array we
currently use. The API is based on the hash table API; it can support
alternative implementations in the future, like red-black trees.
2019-05-24 17:48:08 -07:00
Omar Sandoval
dcddaa2cc1 libdrgn: revamp hash table API
This makes several improvements to the hash table API.

The first two changes make things more general in order to be consistent
with the upcoming binary search tree API:

- Items are renamed to entries.
- Positions are renamed to iterators.
- hash_table_empty() is added.

One change makes the definition API more convenient:

- It is no longer necessary to pass the types into
  DEFINE_HASH_{MAP,SET}_FUNCTIONS().

A few changes take some good ideas from the C++ STL:

- hash_table_insert() now fails on duplicates instead of overwriting.
- hash_table_delete_iterator() returns the next iterator.
- hash_table_next() returns an iterator instead of modifying it.

One change reduces memory usage:

- The lower-level DEFINE_HASH_TABLE() is cleaned up and exposed as an
  alternative to DEFINE_HASH_MAP() and DEFINE_HASH_SET(). This allows us
  to get rid of the duplicated key where a hash map value already embeds
  the key (the DWARF index file table) and gets rid of the need to make
  a dummy hash set entry to do a search (the pointer and array type
  caches).
2019-05-24 17:48:05 -07:00
Omar Sandoval
0026eeae66 libdrgn: find kernel module name in kernels < v4.13
If we can't find the module name in .modinfo, fall back to
.gnu.linkonce.this_module.
2019-05-14 15:39:16 -07:00
Omar Sandoval
e4b8af7807 libdrgn/python: fix uninitialized variable in Program.load_debug_info()
path_converter() reads arg->allow_none, so make sure we zero the array
of path arguments.
2019-05-14 14:28:03 -07:00
Omar Sandoval
39876ccbac libdrgn: find kernel module debuginfo on Debian
Debian's linux-image*-dbg packages name the ELF files without the extra
.debug suffix that Fedora includes.
2019-05-14 12:42:56 -07:00
Omar Sandoval
ac27f2c1ec libdrgn: only load debug information from loaded kernel modules
Currently, we load debug information for every kernel module that we
find under /lib/modules/$(uname -r)/kernel. This has a few issues:

1. Distribution kernels have lots of modules (~3000 for Fedora and
   Debian).
   a) This can exceed the default soft limit on the number of open file
      descriptors.
   b) The mmap'd debug information can trip the overcommit heuristics
      and cause OOM kills.
   c) It can take a long time to parse all of the debug information.
2. Not all modules are under the "kernel" directory; some distros also
   have an "extra" directory.
3. The user is not made aware of loaded kernel modules that don't have
   debug information available.

So, instead of walking /lib/modules, walk the list of loaded kernel
modules and look up their debugging information.
2019-05-14 11:55:39 -07:00
Omar Sandoval
efaec41ca2 libdrgn: add string_builder_append_error() 2019-05-14 11:49:52 -07:00
Omar Sandoval
e21ed988fb libdrgn: add drgn_error_from_string_builder()
And use that instead of exposing drgn_error_create_nodup().
2019-05-14 10:16:01 -07:00
Omar Sandoval
b0f10d3b58 libdrgn: pass enum drgn_error_code to error constructors 2019-05-14 10:13:05 -07:00
Omar Sandoval
f08d4c9a08 libdrgn: make string_builder API return bool
It can only fail with no memory, so simplify it.
2019-05-14 10:07:50 -07:00
Omar Sandoval
0135dbd0cc libdrgn: get loaded module names from /proc/modules when possible
Similar to the last optimization, for the running kernel, we can just
read /proc/modules instead of walking the kernel data structures.
2019-05-13 18:05:17 -07:00
Omar Sandoval
ed6a6f0b3e libdrgn: get module section address from sysfs when possible
In the running kernel, we don't have to walk the list of modules and
module sections, since we can just look it up directly in sysfs.
2019-05-13 18:05:09 -07:00
Omar Sandoval
f11e030aaa libdrgn: factor out kernel module iteration and section lookup 2019-05-13 16:39:30 -07:00
Omar Sandoval
9b563170f8 libdrgn: make load_debug_info() API saner
Rather than exposing the underlying open and load steps of DWARF index,
simplify it down to a single load step.
2019-05-13 15:04:27 -07:00
Omar Sandoval
60c9e26ff5 libdrgn: make C string hash table helpers work with char *
const char * const * is not compatible with char * const *, so make
c_string_hash() and c_string_eq() macros so they can work with both
const char * and char * keys.
2019-05-13 11:52:55 -07:00
Omar Sandoval
1206730730 libdrgn: fix hash_set_insert_searched_pos() documentation 2019-05-13 11:52:55 -07:00
Omar Sandoval
ab58a5bff0 libdrgn: determine default size_t and ptrdiff_t more intelligently
Currently, size_t and ptrdiff_t default to typedefs of the default
unsigned long and long, respectively, regardless of what the program
actually defines unsigned long or long as. Instead, make them refer the
whatever integer type (long, long long, or int) is the same size as the
word size.
2019-05-10 15:14:03 -07:00
Omar Sandoval
83285d3f29 cli: add --symbols and --no-default-symbols
Now we can finally, for example, use vmlinux in a non-standard location.
2019-05-10 13:20:03 -07:00
Omar Sandoval
e65ff24975 cli: group together -c, -k, and -p in --help output 2019-05-10 13:11:48 -07:00
Omar Sandoval
8231414dc9 cli: add --quiet option
Currently, we print warnings for interactive mode but not for script
mode. Make it more explicit with a --quiet option.
2019-05-10 12:57:07 -07:00
Omar Sandoval
baba1ff3f0 libdrgn: make program components pluggable
Currently, programs can be created for three main use-cases: core dumps,
the running kernel, and a running process. However, internally, the
program memory, types, and symbols are pluggable. Expose that as a
callback API, which makes it possible to use drgn in much more creative
ways.
2019-05-10 12:41:07 -07:00
Omar Sandoval
ac946ba8a7 libdrgn: fix zero-filling reads from core dump segments 2019-05-09 16:35:48 -07:00
Omar Sandoval
15bc0286b9 libdrgn: add drgn_error_create_nodup() 2019-05-09 16:35:14 -07:00
Omar Sandoval
fb10623903 libdrgn: use next_power_of_two() for string_builder 2019-05-09 16:01:07 -07:00
Omar Sandoval
6d7b0631b9 libdrgn: simplify string_builder API
Instead of maintaining a null-terminated string, null-terminate just
before returning the string as a "finalize" step.
2019-05-09 15:25:58 -07:00
Omar Sandoval
5200a6652c libdrgn: embed memory reader, type index, and symbol index in program 2019-05-06 14:55:34 -07:00
Omar Sandoval
bb2357bc09 libdrgn: don't require word size for type index initialization 2019-05-06 14:55:34 -07:00
Omar Sandoval
640b1c011d libdrgn: embed DWARF index in DWARF info cache 2019-05-06 14:55:34 -07:00
Omar Sandoval
2ed8e3148c libdrgn: get architecture info from core file instead of DWARF index 2019-05-06 14:55:34 -07:00
Omar Sandoval
ba162ac001 libdrgn: remove endianness from type index
The type index doesn't need to know or care about endianness. Move it to
the program.
2019-05-06 14:55:34 -07:00
Omar Sandoval
565e0343ef libdrgn: make symbol index pluggable with callbacks
The last piece of making the major program components pluggable.
2019-05-06 14:55:34 -07:00
Omar Sandoval
47151c4554 libdrgn/python: add Program.object() 2019-05-06 14:55:34 -07:00
Omar Sandoval
0ea2825817 libdrgn: refactor gen_constants.py
All of the constants are generated with basically the same code, so
refactor it.
2019-05-06 14:55:34 -07:00
Omar Sandoval
9c6575e783 libdrgn: move relocation hook to drgn_info_cache 2019-05-06 14:55:34 -07:00
Omar Sandoval
52a8681a8d libdrgn: rename drgn_dwarf_type_cache to drgn_dwarf_info_cache
This is preparation for sharing this with the symbol index.
2019-05-06 14:55:34 -07:00
Omar Sandoval
a98445c277 libdrgn: make type index pluggable with callbacks
Similar to "libdrgn: make memory reader pluggable with callbacks", we
want to support custom type indexes (imagine, e.g., using drgn to parse
a binary format). For now, this disables the dwarf index tests; we'll
have a better way to test them later, so let's not bother adding more
test scaffolding.
2019-05-06 14:55:34 -07:00
Omar Sandoval
77443cecd1 libdrgn: use NULL-terminated arrays for mock {type,symbol} index
This will simplify the next couple of changes.
2019-05-06 14:55:34 -07:00
Omar Sandoval
c2be52dff0 libdrgn: rename object index to symbol index
An "object index" doesn't actually index objects, but really "partial
objects" -- i.e., a type + address. "Symbol" is a better name for this.
2019-05-06 14:55:34 -07:00
Omar Sandoval
417a6f0d76 libdrgn: make memory reader pluggable with callbacks
I've been planning to make memory readers pluggable (in order to support
use cases like, e.g., reading a core file over the network), but the
C-style "inheritance" drgn uses internally is awkward as a library
interface; it's much easier to just register a callback. This change
effectively makes drgn_memory_reader a mapping from a memory range to an
arbitrary callback. As a bonus, this means that read callbacks can be
mixed and matched; a part of memory can be in a core file, another part
can be in the executable file, and another part could be filled from an
arbitrary buffer.
2019-05-06 14:55:34 -07:00
Omar Sandoval
e78e6c9226 libdrgn/python: improve Python exception handling
Make {set,clear}_drgn_in_python() handle nested calls, and return a
valid error to libdrgn rather than -1.
2019-05-06 14:55:34 -07:00
Omar Sandoval
d0633d2f1a libdrgn: allow multiple cleanup callbacks
For now, this makes things slightly awkward, but it will be necessary
for the upcoming changes making drgn_program more pluggable.
2019-05-06 14:55:34 -07:00
Omar Sandoval
043cddf6d8 libdrgn: move member cache to type index
It makes more sense here than in struct drgn_program.
2019-05-06 14:55:34 -07:00
Omar Sandoval
3645ce78ea libdrgn: assume all pointers have same size in type index 2019-05-06 14:55:34 -07:00
Omar Sandoval
06960f591c libdrgn: look up primitive types on demand
Instead of caching all primitive types ahead of time, look them up on
demand. This is preparation for making the type index API more flexible.
2019-05-06 14:55:34 -07:00
Omar Sandoval
932b7857b5 libdrgn: expose primitive type concept to public interface
Previously known as c_type.
2019-05-06 14:55:34 -07:00
Omar Sandoval
bad1b9b33c libdrgn/python: use generated docstrings for generated types
I missed ProgramFlags, Qualifiers, and TypeKind.
2019-05-06 14:55:34 -07:00
Omar Sandoval
839252564a libdrgn: deduplicate files in DWARF index
Currently, we deduplicate files for userspace mappings manually.
However, to prepare for adding symbol files at runtime, move the
deduplication to DWARF index. In the future, we probably want to
deduplicate based on build ID, as well.
2019-05-06 14:55:34 -07:00
Omar Sandoval
7dbf8447b2 tests: test comparison between Type and other object 2019-05-06 14:55:34 -07:00