mirror of
https://github.com/JakeHillion/drgn.git
synced 2024-12-23 17:53:07 +00:00
5e2d70f09e
A name and tag are not always enough to uniquely identify a type or variable. For example, "struct workspace" in the Linux kernel can refer to one of at least three types; fs/btrfs/{lzo,zlib,zstd}.c each have their own struct workspace type. We can, however, also differentiate DIEs on the file they were declared in. The naive thing to do would be to include the file name as a string in the hash table entry. However, that means we must allocate and canonicalize each path in the line number program header and pay an extra cache miss plus string comparison when adding a new entry. We can get rid of the cache miss and string comparison if we instead map the file name to a unique identifier. The foolproof way to do this would be to create another big hash table of file names and use the hash table entry index as the unique identifier. However, for this, we'd still need to allocate and canoicalize each path as well as worry about another big hash table. Once we observe that we can get away with "almost certainly unique" instead of "truly unique" identifiers, the next logical step is to just use a hash of the file name as the identifier. With a 64-bit hash and the ~50k files in the kernel, the probability of a collision is 1 in 10 billion. Even in the extremely unlikely event that there is a collision, it only matters if the files with colliding names also have colliding DIEs, which brings things pretty close to the realm of impossibility. After this change, DwarfIndex.find() returns a list of DIEs matching the name and tag. The callers will be updated to use the list in upcoming changes. |
||
---|---|---|
.. | ||
__init__.py | ||
test_corereader.py | ||
test_leb128.py | ||
test_memberdesignator.py | ||
test_program.py | ||
test_rlcompleter.py | ||
test_type.py | ||
test_typeindex.py | ||
test_typename.py |