d531da6f8a
1 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Graham Christensen
|
fd045173ce |
referencesByPopularity: init to sort packages by a cachability heuristic
Using a simple algorithm, convert the references to a path in to a sorted list of dependent paths based on how often they're referenced and how deep in the tree they live. Equally-"popular" paths are then sorted by name. The existing writeReferencesToFile prints the paths in a simple ascii-based sorting of the paths. Sorting the paths by graph improves the chances that the difference between two builds appear near the end of the list, instead of near the beginning. This makes a difference for Nix builds which export a closure for another program to consume, if that program implements its own level of binary diffing. For an example, Docker Images. If each store path is a separate layer then Docker Images can be very efficiently transfered between systems, and we get very good cache reuse between images built with the same version of Nixpkgs. However, since Docker only reliably supports a small number of layers (42) it is important to pick the individual layers carefully. By storing very popular store paths in the first 40 layers, we improve the chances that the next Docker image will share many of those layers.* Given the dependency tree: A - B - C - D -\ \ \ \ \ \ \ \ \ \ \ - E ---- F \- G Nodes which have multiple references are duplicated: A - B - C - D - F \ \ \ \ \ \- E - F \ \ \ \- E - F \ \- G Each leaf node is now replaced by a counter defaulted to 1: A - B - C - D - (F:1) \ \ \ \ \ \- E - (F:1) \ \ \ \- E - (F:1) \ \- (G:1) Then each leaf counter is merged with its parent node, replacing the parent node with a counter of 1, and each existing counter being incremented by 1. That is to say `- D - (F:1)` becomes `- (D:1, F:2)`: A - B - C - (D:1, F:2) \ \ \ \ \ \- (E:1, F:2) \ \ \ \- (E:1, F:2) \ \- (G:1) Then each leaf counter is merged with its parent node again, merging any counters, then incrementing each: A - B - (C:1, D:2, E:2, F:5) \ \ \ \- (E:1, F:2) \ \- (G:1) And again: A - (B:1, C:2, D:3, E:4, F:8) \ \- (G:1) And again: (A:1, B:2, C:3, D:4, E:5, F:9, G:2) and then paths have the following "popularity": A 1 B 2 C 3 D 4 E 5 F 9 G 2 and the popularity contest would result in the paths being printed as: F E D C B G A * Note: People who have used a Dockerfile before assume Docker's Layers are inherently ordered. However, this is not true -- Docker layers are content-addressable and are not explicitly layered until they are composed in to an Image. |