Commit Graph

1994 Commits

Author SHA1 Message Date
Omar Sandoval
263f8c2c4d dwarf: update DWARF constants
Update generate_dwarf_constants.py to parse dwarf.h, and update it with
the dwarf.h on my system.
2018-06-27 22:31:01 -07:00
Omar Sandoval
2a76972e68 Get symbol address from DWARF/ELF info instead of /proc/kallsyms
This has a few benefits:

1. We no longer have to parse /proc/kallsyms, which actually takes just
   as long as parsing all of the DWARF files
2. We can support vmcores
3. We can find the address of a specific variable even if it has the
   same name as other variables in the same object file
2018-06-27 21:45:36 -07:00
Omar Sandoval
266b35a7ae dwarfindex/elf: save file path in ElfFile 2018-06-27 19:51:06 -07:00
Omar Sandoval
0adac0747c Bring back ElfFile
Instead of constructing DwarfFile with a dict of sections, pass in an
ElfFile.
2018-06-25 23:43:14 -07:00
Omar Sandoval
68ebade7fa dwarfindex: fix number of symbols calculation 2018-06-25 21:47:53 -07:00
Omar Sandoval
d88ad8bd1d cli: read OS release from vmcoreinfo instead of uname
This is preparation for a couple of big future changes:

1. Using the ELF symbol table instead of /proc/kallsyms
2. Supporting vmcore crash dumps
2018-06-25 20:52:52 -07:00
Omar Sandoval
8340afb03b corereader: add context manager methods 2018-06-25 19:59:22 -07:00
Omar Sandoval
15c3db0936 corereader: support reading from physical addresses
/proc/kcore and vmcores include the physical memory address in the
program headers. Reading from physical memory can be useful, so support
it in CoreReader.
2018-06-25 00:41:29 -07:00
Omar Sandoval
26d3880708 dwarfindex: move indexing to a method instead of __init__()
We want to allow indexing files on the fly instead of all up front, so
move the indexing to a new DwarfIndex.add() method.
2018-06-25 00:25:49 -07:00
Omar Sandoval
6841eaace4 dwarfindex: replace cu->file and die_hash_entry->cu pointers with indices
We need this in order to be able to reallocate the files and cus arrays.
It also shrinks the size of struct die_hash_entry back to what it was
before file_name_hash was added.
2018-06-24 19:57:56 -07:00
Omar Sandoval
e2caf2bb0d dwarfindex: don't index any declarations and handle specifications
For variables which are predeclared, GCC generates a DW_TAG_variable DIE
with DW_AT_name and DW_AT_declaration as well as a DW_TAG_variable DIE
without DW_AT_name but with DW_AT_specification pointing to the
declaration DIE. We should index the latter, not the former. This has a
couple of benefits: we can skip indexing variable declaration DIEs,
which contribute a lot of duplicate hash table insertions; and, we can
always get the address of a variable from DW_AT_location of the indexed
DIE instead of having to parse the symbol table.
2018-06-23 00:23:30 -07:00
Omar Sandoval
a856eb4af9 cli: handle specification variable DIEs 2018-06-23 00:20:52 -07:00
Omar Sandoval
a05913298b dwarfindex: split read_die() out of index_cu() 2018-06-22 23:57:41 -07:00
Omar Sandoval
e9596bae4e dwarfindex: relax add_die_hash_entry() memory ordering
Currently, we use the sequentially consistent memory model for all
operations, which isn't necessary. The only ordering requirement is that
a thread which finds an already-used hash table slot sees it initialized
once it sees the tag set. This does get rid of an mfence on x86_64, but
it didn't have any measureable performance improvement.
2018-06-22 20:31:43 -07:00
Omar Sandoval
5e2d70f09e dwarfindex: add file name to DIE index entry key
A name and tag are not always enough to uniquely identify a type or
variable. For example, "struct workspace" in the Linux kernel can refer
to one of at least three types; fs/btrfs/{lzo,zlib,zstd}.c each have
their own struct workspace type. We can, however, also differentiate
DIEs on the file they were declared in.

The naive thing to do would be to include the file name as a string in
the hash table entry. However, that means we must allocate and
canonicalize each path in the line number program header and pay an
extra cache miss plus string comparison when adding a new entry.

We can get rid of the cache miss and string comparison if we instead map
the file name to a unique identifier. The foolproof way to do this would
be to create another big hash table of file names and use the hash table
entry index as the unique identifier. However, for this, we'd still need
to allocate and canoicalize each path as well as worry about another big
hash table.

Once we observe that we can get away with "almost certainly unique"
instead of "truly unique" identifiers, the next logical step is to just
use a hash of the file name as the identifier. With a 64-bit hash and
the ~50k files in the kernel, the probability of a collision is 1 in 10
billion. Even in the extremely unlikely event that there is a collision,
it only matters if the files with colliding names also have colliding
DIEs, which brings things pretty close to the realm of impossibility.

After this change, DwarfIndex.find() returns a list of DIEs matching the
name and tag. The callers will be updated to use the list in upcoming
changes.
2018-06-21 23:13:39 -07:00
Omar Sandoval
19ddfd77b9 Add C SipHash implementation
For disambiguating symbols which are defined in multiple files, I want
to use a hash to identify file names without needing to compare the file
names themselves. DJBX33A doesn't cut it for this, as 32 bits isn't
enough, and we can afford to spend more cycles to avoid collisions.

SipHash is a good candidate: it produces a 64-bit output, is pretty fast
for short keys, and can be used incrementally. This adds an
implementation of SipHash-1-3 (the cheapest variant) without a key.
2018-06-21 22:54:07 -07:00
Omar Sandoval
4317342686 dwarfindex: split die_object_from_entry() out of DwarfIndex.find() 2018-06-20 23:24:18 -07:00
Omar Sandoval
d86a726e9b dwarfindex: unify index_cu() return code path 2018-06-19 21:54:21 -07:00
Omar Sandoval
ce91dfb9e9 dwarfindex: add macro for resizing array 2018-06-18 21:30:02 -07:00
Omar Sandoval
6d59bc91e4 dwarfindex: read .debug_line
Preparation for the following changes.
2018-06-18 18:25:33 -07:00
Omar Sandoval
a65c78eadc dwarfindex: use enums for tag/children bit masks 2018-06-17 21:58:04 -07:00
Omar Sandoval
ca1670217e dwarfindex: number attribute command enum more nicely
Instead of defining each command number explicitly, just define the
maximum skip command (i.e., the mininmum explicit command minus one),
and assert that the maximum explicit command is 255.
2018-06-17 21:10:30 -07:00
Omar Sandoval
d3a544d498 dwarf: add DieAttrib.__repr__()
The attribute and form names are much more useful than the values.
2018-06-16 23:32:32 -07:00
Omar Sandoval
c9e974d8ca dwarf: support getting declaration location of a DIE
I only added this so I could manually check this on DIEs, but it would
be possible to add it to ProgramObject or Type in the future.
2018-06-16 21:52:35 -07:00
Omar Sandoval
856892375a typeindex: handle pointer to const void in from_dwarf_type()
Like pointer types and function types, qualified void types don't have a
DW_AT_type.
2018-06-15 22:36:58 -07:00
Omar Sandoval
8bca8f264c type: fix stray comma in VoidType.__repr__() 2018-06-15 22:36:28 -07:00
Omar Sandoval
cecd7f9f80 helpers: add block layer helpers to iterate over disks and partitions 2018-06-15 22:03:18 -07:00
Omar Sandoval
abcf83cf09 helpers: add kernel dev_t helpers 2018-06-15 22:02:37 -07:00
Omar Sandoval
36a19f52e4 helpers: add fget(), for_each_file(), and print_files() kernel fs helpers 2018-06-13 23:22:24 -07:00
Omar Sandoval
3a580c8de9 helpers: handle dentries with d_dname() in d_path()
For now, return None for these special dentries. In the future, we can
probably special case the common ones, like sockfs and anon_inode, but
this is fine for now.
2018-06-13 22:57:30 -07:00
Omar Sandoval
80cb79adf6 helpers: fix d_path()
It should actually take a struct vfsmount * as documented, and struct
path * should work.
2018-06-13 20:36:15 -07:00
Omar Sandoval
14d82c6b71 helpers: add basic mm helpers
Mostly for translating between page/pfn/virt.
2018-06-12 01:02:09 -07:00
Omar Sandoval
f7cb414383 typeindex: work around enum type DIE on old versions of GCC
GCC before 5.1 (specifically [1]) doesn't include the underlying int
type in enum type DIEs. We can work around this by making one up based
on the size and values of the enum.

1: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=210717
2018-05-29 23:12:34 -07:00
Omar Sandoval
7be8701fa4 helpers: rename task() to find_task()
task is too generic and easy to clobber accidentally.
2018-05-29 22:42:43 -07:00
Omar Sandoval
525122896f helpers: add src, dst, and fstype arguments to for_each_mount()
Usually, you only want a specific mount, so this saves having to do the
check manually.
2018-05-29 22:39:50 -07:00
Omar Sandoval
3bc5b3b4fc helpers: rename mounts() to for_each_mount() 2018-05-29 21:08:30 -07:00
Omar Sandoval
e2ef1b9b27 helpers: support PID hash from before Linux v4.15 2018-05-29 21:05:54 -07:00
Omar Sandoval
fea85e8d72 helpers: add kernel PID helpers
I considered calling this module sched instead of pid but it's really
about PIDs, not tasks.
2018-05-29 00:25:45 -07:00
Omar Sandoval
d4b73c03ac helpers: add kernel VFS helpers 2018-05-28 23:51:37 -07:00
Omar Sandoval
ea121af2ad Add Linux kernel helpers
Beyond providing the raw interface for accessing objects, drgn should
provide helpers for inspecting programs. This introduces the first set
of helpers for the Linux kernel. In the future, it would be cool to have
helpers for common libraries like libstd++.
2018-05-28 00:08:03 -07:00
Omar Sandoval
32974c3c98 Fix some minor style problems 2018-05-27 22:44:05 -07:00
Omar Sandoval
a7448f4ce6 program: fix mypy errors about builtins._ 2018-05-26 23:43:55 -07:00
Omar Sandoval
5701689436 program: add ProgramObject.read_once() 2018-05-26 23:01:09 -07:00
Omar Sandoval
cc5fb61295 program: add Program.null()
Shorthand for Program.object(type, value=0).
2018-05-26 22:49:32 -07:00
Omar Sandoval
52052959ae program: only convert errors from CompoundType.member() to AttributeError
Errors reading memory, for example, should stay as ValueError. Maybe
these should have more specific exception types in the future.
2018-05-26 22:44:25 -07:00
Omar Sandoval
906f16832f program: make ProgramObject() address optional
This way, rvalues can be more conveniently created with
prog.object(type, value=x).
2018-05-26 17:48:52 -07:00
Omar Sandoval
2ee6662639 program: handle pointers to typedefs of struct/union types
ProgramObject.__dir__(), member_(), and container_of_() all check that
the relevant type is a struct or union, but they all need to allow
typedefs of structs or unions, as well.
2018-05-25 22:32:03 -07:00
Omar Sandoval
9c94b82c6c program: make CompoundType.member() public
ProgramObject.member_() calls typeof() and offsetof(), which we can get
at the same time. While we're here, catch AttributeError instead of
doing the isinstance() check, which is slightly more efficient.
2018-05-25 22:17:54 -07:00
Omar Sandoval
6b178c5108 program: only check struct or union in ProgramObject.member_()
We're currently also doing this check in __getattr__(), which is
unnecessary overhead in the common case. We can just check the exception
in the error case.
2018-05-25 21:56:24 -07:00
Omar Sandoval
95c0682718 corereader: implement type reads in C
Now that read_memory() was converted to C, IntType.read() and friends
are a bottleneck. Convert them to C methods of CoreReader.
2018-05-25 00:41:12 -07:00