The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes while providing the same worst-case query time guarantees.
Website | Documentation | Paper | Slides | Python wrapper | A³ Lab
run:
g++ project-benchmark.cpp -std=c++17 -I./include -o ./project-benchmark -lpthread wormhole/libwh.so -O3This builds the tester, two python files are also provided, and can be used for:
- running a very long benchmark
- rearranging the data from it to a nice format that cna be used easily with msexcel to create graphs.
This is a header-only library. It does not need to be installed. Just clone the repo with
git clone https://github.com/gvinciguerra/PGM-index.git
cd PGM-indexand copy the include/pgm directory to your system's or project's include path.
The examples/simple.cpp file shows how to index and query a vector of random integers with the PGM-index:
#include <vector>
#include <cstdlib>
#include <iostream>
#include <algorithm>
#include "pgm/pgm_index.hpp"
int main() {
// Generate some random data
std::vector<int> data(1000000);
std::generate(data.begin(), data.end(), std::rand);
data.push_back(42);
std::sort(data.begin(), data.end());
// Construct the PGM-index
const int epsilon = 128; // space-time trade-off parameter
pgm::PGMIndex<int, epsilon> index(data);
// Query the PGM-index
auto q = 42;
auto range = index.search(q);
auto lo = data.begin() + range.lo;
auto hi = data.begin() + range.hi;
std::cout << *std::lower_bound(lo, hi, q);
return 0;
}Run and edit this and other examples on Repl.it. Or run it locally via:
g++ examples/simple.cpp -std=c++17 -I./include -o simple
./simpleOther than the pgm::PGMIndex class in the example above, this library provides the following classes:
pgm::DynamicPGMIndexsupports insertions and deletions.pgm::MultidimensionalPGMIndexstores points in k dimensions and supports orthogonal range queries.pgm::MappedPGMIndexstores data on disk and uses a PGMIndex for fast search operations.pgm::CompressedPGMIndexcompresses the segments to reduce the space usage of the index.pgm::OneLevelPGMIndexuses a binary search on the segments rather than a recursive structure.pgm::BucketingPGMIndexuses a top-level lookup table to speed up the search on the segments.pgm::EliasFanoPGMIndexuses a top-level succinct structure to speed up the search on the segments.
The full documentation is available here.
After cloning the repository, build the project with
cmake . -DCMAKE_BUILD_TYPE=Release
make -j8The test runner will be placed in test/. The tuner executable will be placed in tuner/. The benchmark executable will be placed in benchmark/.
This project is licensed under the terms of the Apache License 2.0.
If you use the library please put a link to the website and cite the following paper:
Paolo Ferragina and Giorgio Vinciguerra. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB, 13(8): 1162-1175, 2020.
@article{Ferragina:2020pgm,
Author = {Paolo Ferragina and Giorgio Vinciguerra},
Title = {The {PGM-index}: a fully-dynamic compressed learned index with provable worst-case bounds},
Year = {2020},
Volume = {13},
Number = {8},
Pages = {1162--1175},
Doi = {10.14778/3389133.3389135},
Url = {https://pgm.di.unipi.it},
Issn = {2150-8097},
Journal = {{PVLDB}}}