This is based on Jeff's most excellent work to identify why
non-recursive listing under postgreskv was phenomenally slow. It turns
out PostgreSQL's query planner was actually using two sequential scans
of the pathdata table to do its job. It's unclear for how long that has
been happening, but obviously it won't scale any further.
The main change is propagating bucket association with pathnames through
the CTE so that the query planner lets itself use the pathdata index on
(bucket, fullpath) for the skipping-forward part.
Jeff also had some changes to the range ends to keep NULL from being
used- I believe with the intent of making sure the query planner was
able to use the pathdata index. My tests on postgres 9.6 and 11
indicate that those changes don't make any appreciable difference in
performance or query plan, so I'm going to leave them off for now to
avoid a careful audit of the semantic differences.
There is a test included here, which only serves to check that the new
version of the function is indeed active. To actually ensure that no
sequential scans are being used in the query plan anymore, our tests
would need to be run against a test db with lots of data already loaded
in it, and that isn't feasible for now.
Change-Id: Iffe9a1f411c54a2f742a4abb8f2df0c64fd662cb
* initial test
* add parenthesis
* remove pipeline
* add few todos
* use docker image for environment
* use pipeline
* fix
* add missing steps
* invoke with bash
* disable protoc
* try using golang image
* try as root
* Disable install-awscli.sh temporarily
* Debugging
* debugging part 2
* Set absolute path for debugging
* Remove absolute path
* Dont run as root
* Install unzip
* Dont forget to apt-get update
* Put into folder that is in PATH
* disable IPv6 Test
* add verbose info and check protobuf
* make integration non-parallel
* remove -v and make checkout part of build
* make a single block for linting
* fix echo
* update
* try using things directly
* try add xunit output
* fix name
* don't print empty lines
* skip testsuites without any tests
* remove coverage, because it's not showing the right thing
* try using dockerfile
* fix deb source
* fix typos
* setup postgres
* use the right flag
* try using postgresdb
* expose different port
* remove port mapping
* start postgres
* export
* use env block
* try using different host for integration tests
* eat standard ports
* try building images and binaries
* remove if statement
* add steps
* do before verification
* add go get goversioninfo
* make separate jenkinsfile
* add check
* don't add empty packages
* disable logging to reduce output size
* add timeout
* add comment about mfridman
* Revert Absolute Path
* Add aws to PATH
* PATH Changes
* Docker Env Fixes
* PATH Simplification
* Debugging the PATH
* Debug Logs
* Debugging
* Update PATH Handling
* Rename
* revert changes to Jenkinsfile
..although it ought to work for other storage.KeyValueStore needs as
well. it's just optimized to work pretty well for a largish hierarchy of
paths.
This includes the addition of "long benchmarks" for KeyValueStore
testing. These will only be run when -test-bench-long is added to the
test flags. In these benchmarks, a large corpus of paths matching a
natural ("real-life") hierarchy is read from paths.data.gz (which you
can get from https://github.com/storj/path-test-corpus) and imported
into a particular KeyValueStore. Recursive and non-recursive queries are
run on it to detect performance problems that arise only at scale.
This also includes alternate implementation of the postgreskv client,
which works in a less-bizarre way for non-recursive queries, but suffers
from poor performance in tests such as the long benchmarks. Once this
alternate impl is committed to the tree, we can remove it again; I just
want it to be available for future reference.