Commit Graph

255 Commits

Author SHA1 Message Date
Omar Sandoval
dd149c6d4f helpers: remove leading slash from dentry_path()
The path returned is relative to the root of the filesystem, not an
absolute path, so remove the misleading leading slash.
2018-07-02 23:51:00 -07:00
Omar Sandoval
2b22fe0a29 helpers: simplify PID enum handling 2018-07-02 23:48:49 -07:00
Omar Sandoval
b20d544242 helpers: add list_for_each{,_entry}_reverse() 2018-07-02 23:48:32 -07:00
Omar Sandoval
06064fa988 elf: fix style error 2018-06-30 22:10:21 -07:00
Omar Sandoval
50f52ec295 program: make ProgramObject() arguments mandatory keywords
It's too hard to remember whether value or address comes first, so make
sure we're always explicit which one.
2018-06-28 23:40:47 -07:00
Omar Sandoval
0d4c4d159b variableindex: include enumerators in variable search 2018-06-28 23:07:13 -07:00
Omar Sandoval
9d24bc9c5c dwarfindex: make DwarfIndex.find() tag optional 2018-06-28 21:53:00 -07:00
Omar Sandoval
d9e6fa94b5 dwarfindex: index enumerator names
The enumerator DIE itself doesn't let us get at the type, though, so
hack it to point at the enumeration_type DIE.
2018-06-28 21:47:27 -07:00
Omar Sandoval
cea27cc0a0 variableindex: support disambiguating variables by filename 2018-06-28 21:16:06 -07:00
Omar Sandoval
aeb767d43e Replace lookup_variable callback with new VariableIndex class
Similar to how we replaced the lookup_type callback with TypeIndex.
2018-06-28 21:05:21 -07:00
Omar Sandoval
02c75770ea typeindex: support disambiguating types by filename 2018-06-28 19:34:10 -07:00
Omar Sandoval
263f8c2c4d dwarf: update DWARF constants
Update generate_dwarf_constants.py to parse dwarf.h, and update it with
the dwarf.h on my system.
2018-06-27 22:31:01 -07:00
Omar Sandoval
2a76972e68 Get symbol address from DWARF/ELF info instead of /proc/kallsyms
This has a few benefits:

1. We no longer have to parse /proc/kallsyms, which actually takes just
   as long as parsing all of the DWARF files
2. We can support vmcores
3. We can find the address of a specific variable even if it has the
   same name as other variables in the same object file
2018-06-27 21:45:36 -07:00
Omar Sandoval
266b35a7ae dwarfindex/elf: save file path in ElfFile 2018-06-27 19:51:06 -07:00
Omar Sandoval
0adac0747c Bring back ElfFile
Instead of constructing DwarfFile with a dict of sections, pass in an
ElfFile.
2018-06-25 23:43:14 -07:00
Omar Sandoval
68ebade7fa dwarfindex: fix number of symbols calculation 2018-06-25 21:47:53 -07:00
Omar Sandoval
d88ad8bd1d cli: read OS release from vmcoreinfo instead of uname
This is preparation for a couple of big future changes:

1. Using the ELF symbol table instead of /proc/kallsyms
2. Supporting vmcore crash dumps
2018-06-25 20:52:52 -07:00
Omar Sandoval
8340afb03b corereader: add context manager methods 2018-06-25 19:59:22 -07:00
Omar Sandoval
15c3db0936 corereader: support reading from physical addresses
/proc/kcore and vmcores include the physical memory address in the
program headers. Reading from physical memory can be useful, so support
it in CoreReader.
2018-06-25 00:41:29 -07:00
Omar Sandoval
26d3880708 dwarfindex: move indexing to a method instead of __init__()
We want to allow indexing files on the fly instead of all up front, so
move the indexing to a new DwarfIndex.add() method.
2018-06-25 00:25:49 -07:00
Omar Sandoval
6841eaace4 dwarfindex: replace cu->file and die_hash_entry->cu pointers with indices
We need this in order to be able to reallocate the files and cus arrays.
It also shrinks the size of struct die_hash_entry back to what it was
before file_name_hash was added.
2018-06-24 19:57:56 -07:00
Omar Sandoval
e2caf2bb0d dwarfindex: don't index any declarations and handle specifications
For variables which are predeclared, GCC generates a DW_TAG_variable DIE
with DW_AT_name and DW_AT_declaration as well as a DW_TAG_variable DIE
without DW_AT_name but with DW_AT_specification pointing to the
declaration DIE. We should index the latter, not the former. This has a
couple of benefits: we can skip indexing variable declaration DIEs,
which contribute a lot of duplicate hash table insertions; and, we can
always get the address of a variable from DW_AT_location of the indexed
DIE instead of having to parse the symbol table.
2018-06-23 00:23:30 -07:00
Omar Sandoval
a856eb4af9 cli: handle specification variable DIEs 2018-06-23 00:20:52 -07:00
Omar Sandoval
a05913298b dwarfindex: split read_die() out of index_cu() 2018-06-22 23:57:41 -07:00
Omar Sandoval
e9596bae4e dwarfindex: relax add_die_hash_entry() memory ordering
Currently, we use the sequentially consistent memory model for all
operations, which isn't necessary. The only ordering requirement is that
a thread which finds an already-used hash table slot sees it initialized
once it sees the tag set. This does get rid of an mfence on x86_64, but
it didn't have any measureable performance improvement.
2018-06-22 20:31:43 -07:00
Omar Sandoval
5e2d70f09e dwarfindex: add file name to DIE index entry key
A name and tag are not always enough to uniquely identify a type or
variable. For example, "struct workspace" in the Linux kernel can refer
to one of at least three types; fs/btrfs/{lzo,zlib,zstd}.c each have
their own struct workspace type. We can, however, also differentiate
DIEs on the file they were declared in.

The naive thing to do would be to include the file name as a string in
the hash table entry. However, that means we must allocate and
canonicalize each path in the line number program header and pay an
extra cache miss plus string comparison when adding a new entry.

We can get rid of the cache miss and string comparison if we instead map
the file name to a unique identifier. The foolproof way to do this would
be to create another big hash table of file names and use the hash table
entry index as the unique identifier. However, for this, we'd still need
to allocate and canoicalize each path as well as worry about another big
hash table.

Once we observe that we can get away with "almost certainly unique"
instead of "truly unique" identifiers, the next logical step is to just
use a hash of the file name as the identifier. With a 64-bit hash and
the ~50k files in the kernel, the probability of a collision is 1 in 10
billion. Even in the extremely unlikely event that there is a collision,
it only matters if the files with colliding names also have colliding
DIEs, which brings things pretty close to the realm of impossibility.

After this change, DwarfIndex.find() returns a list of DIEs matching the
name and tag. The callers will be updated to use the list in upcoming
changes.
2018-06-21 23:13:39 -07:00
Omar Sandoval
19ddfd77b9 Add C SipHash implementation
For disambiguating symbols which are defined in multiple files, I want
to use a hash to identify file names without needing to compare the file
names themselves. DJBX33A doesn't cut it for this, as 32 bits isn't
enough, and we can afford to spend more cycles to avoid collisions.

SipHash is a good candidate: it produces a 64-bit output, is pretty fast
for short keys, and can be used incrementally. This adds an
implementation of SipHash-1-3 (the cheapest variant) without a key.
2018-06-21 22:54:07 -07:00
Omar Sandoval
4317342686 dwarfindex: split die_object_from_entry() out of DwarfIndex.find() 2018-06-20 23:24:18 -07:00
Omar Sandoval
d86a726e9b dwarfindex: unify index_cu() return code path 2018-06-19 21:54:21 -07:00
Omar Sandoval
ce91dfb9e9 dwarfindex: add macro for resizing array 2018-06-18 21:30:02 -07:00
Omar Sandoval
6d59bc91e4 dwarfindex: read .debug_line
Preparation for the following changes.
2018-06-18 18:25:33 -07:00
Omar Sandoval
a65c78eadc dwarfindex: use enums for tag/children bit masks 2018-06-17 21:58:04 -07:00
Omar Sandoval
ca1670217e dwarfindex: number attribute command enum more nicely
Instead of defining each command number explicitly, just define the
maximum skip command (i.e., the mininmum explicit command minus one),
and assert that the maximum explicit command is 255.
2018-06-17 21:10:30 -07:00
Omar Sandoval
d3a544d498 dwarf: add DieAttrib.__repr__()
The attribute and form names are much more useful than the values.
2018-06-16 23:32:32 -07:00
Omar Sandoval
c9e974d8ca dwarf: support getting declaration location of a DIE
I only added this so I could manually check this on DIEs, but it would
be possible to add it to ProgramObject or Type in the future.
2018-06-16 21:52:35 -07:00
Omar Sandoval
856892375a typeindex: handle pointer to const void in from_dwarf_type()
Like pointer types and function types, qualified void types don't have a
DW_AT_type.
2018-06-15 22:36:58 -07:00
Omar Sandoval
8bca8f264c type: fix stray comma in VoidType.__repr__() 2018-06-15 22:36:28 -07:00
Omar Sandoval
cecd7f9f80 helpers: add block layer helpers to iterate over disks and partitions 2018-06-15 22:03:18 -07:00
Omar Sandoval
abcf83cf09 helpers: add kernel dev_t helpers 2018-06-15 22:02:37 -07:00
Omar Sandoval
36a19f52e4 helpers: add fget(), for_each_file(), and print_files() kernel fs helpers 2018-06-13 23:22:24 -07:00
Omar Sandoval
3a580c8de9 helpers: handle dentries with d_dname() in d_path()
For now, return None for these special dentries. In the future, we can
probably special case the common ones, like sockfs and anon_inode, but
this is fine for now.
2018-06-13 22:57:30 -07:00
Omar Sandoval
80cb79adf6 helpers: fix d_path()
It should actually take a struct vfsmount * as documented, and struct
path * should work.
2018-06-13 20:36:15 -07:00
Omar Sandoval
14d82c6b71 helpers: add basic mm helpers
Mostly for translating between page/pfn/virt.
2018-06-12 01:02:09 -07:00
Omar Sandoval
f7cb414383 typeindex: work around enum type DIE on old versions of GCC
GCC before 5.1 (specifically [1]) doesn't include the underlying int
type in enum type DIEs. We can work around this by making one up based
on the size and values of the enum.

1: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=210717
2018-05-29 23:12:34 -07:00
Omar Sandoval
7be8701fa4 helpers: rename task() to find_task()
task is too generic and easy to clobber accidentally.
2018-05-29 22:42:43 -07:00
Omar Sandoval
525122896f helpers: add src, dst, and fstype arguments to for_each_mount()
Usually, you only want a specific mount, so this saves having to do the
check manually.
2018-05-29 22:39:50 -07:00
Omar Sandoval
3bc5b3b4fc helpers: rename mounts() to for_each_mount() 2018-05-29 21:08:30 -07:00
Omar Sandoval
e2ef1b9b27 helpers: support PID hash from before Linux v4.15 2018-05-29 21:05:54 -07:00
Omar Sandoval
fea85e8d72 helpers: add kernel PID helpers
I considered calling this module sched instead of pid but it's really
about PIDs, not tasks.
2018-05-29 00:25:45 -07:00
Omar Sandoval
d4b73c03ac helpers: add kernel VFS helpers 2018-05-28 23:51:37 -07:00