Artifacts for "GraphMini: Accelerating Graph Pattern Matching Using Auxiliary Graphs"
Github link: https://github.com/ds-umass/GraphMini/
- 128GB of free RAM to preprocess the graph Friendster correctly.
- 180GB of free disk space to store preprocessed graphs.
- Ubuntu 22.04 (tested)
- CMake (Version >= 3.20)
- Make
- MPI
- GCC compiler (>= 7)
- Python (>= 3.8)
- bc
- clang-format (optional)
sudo apt install bc cmake mpich clang-format -y- Wiki
- YouTube
- Patents
- LiveJournal
- Orkut
- Friendster
All graphs are treated as unlabelled and undirected graphs. We also provide scripts to download and preprocess these data automatically. See details below.
- Datasets
    Subfolders:
    - Dryadic (preprocessed graph data for Dryadic)
    - GraphPi (preprocessed graph data for GraphPi)
    - GraphMini (preprocessed graph data for GraphMini)
    - TXT (downloaded graph data from SNAP)
    Scripts: the following scripts are used to download and preprocess datasets used in the experiments
    - download.sh (downloads data from SNAP)
    - dryadic_prep.sh scripts for preprocessing datasets into format can be handled by Dryadic
    - graphpi.sh scripts for preprocessing datasets into format can be handled by GraphPi
    - graphmini_prep.sh scripts for preprocessing datasets into format can be handled by GraphMini
    - prep.sh run all the scripts above
- Dryadic (source code/binaries of Dryadic)
- GraphPi (source code/binaries of GraphPi)
- GraphMini (source code/binaries of GraphMini)
- Experiments 
    Subfolders:
    - Dryadic (experiment results for Dryadic)
    - GraphMini (experiment results for GraphMini)
    - GraphPi (experiment results for GraphPi)
    Scripts: the following scripts are used to automate running the benchmark experiments that appeared in the paper
    - queries.sh (defines what queries and graphs to run in the experiment)
    - run_dryadic_edge.sh (test dryadic on edge-induced pattern defined in queries.sh)
    - run_dryadic_vertex.sh (test dryadic on vertex-induced pattern defined in queries.sh)
    - run_graphpi_edge_iep.sh (test graphpi on edge-induced pattern defined in queries.sh, with inclusion-exclusion optimization)
    - run_graphpi_edge.sh (test graphpi on edge-induced pattern defined in queries.sh, without inclusion-exclusion optimization)
    - run_graphmini_vertex.sh (test GraphMini on vertex-induced pattern defined in queries.sh, without inclusion-exclusion optimization)
    - run_graphmini_edge.sh (test GraphMini on edge-induced pattern defined in queries.sh, without inclusion-exclusion optimization)
    - run_graphmini_edge_iep.sh (test GraphMini on edge-induced pattern defined in queries.sh, with inclusion-exclusion optimization)
    - run_base_vertex.sh (test Base on vertex-induced pattern defined in queries.sh, without inclusion-exclusion optimization)
    - run_base_edge.sh (test Base on edge-induced pattern defined in queries.sh, without inclusion-exclusion optimization)
    - run_base_edge_iep.sh (test Base on edge-induced pattern defined in queries.sh, with inclusion-exclusion optimization)- Build the source code for Dryadic, GraphPi and GraphMini:
bash ./build.sh- Download the dataset from SNAP and preprocess the datasets for Dryadic, GraphPi, and GraphMini:
bash ./Datasets/prep.sh- Define the queries to run and graphs to test by modifying the following file
./Experiments/queries.shThe Queries variable defines the queries to test. It takes inputs in (undirected) adjacency matrix format.
The QuerySizes variable defines the number of vertex in each of the query patterns.
The GraphNames defines the data graphs to run the experiments. Examples are given in the script. Notice that the default setting will run experiments on all the graphs, which can take a significant amount of time, you can modify the GraphNames to run experiments on graphs of interest only.
The TIMEOUT defines the maximum amount of time before terminating a query. Notice that some queries can run for a very long time (>24h). You can modify the "TIMEOUT" variable to change the upper bound of query execution time. GraphMini can finish most queries in less than 12 hours on a 32-core system.
- Reproduce the benchmark experiments:
nohup bash ./Experiments/run.sh &This script will run all the tested systems (GraphPi, Dryadic, GraphMini, Base) by invoking all the tested scripts in the directory (e.g. run_graphmini_vertex.sh). Each script can take more than 3 days to run. So if you want to get results quickly, you can use multiple machines and run them separately by invoking each script one by one.
The script will generate a log file for each tested query. The path to the log file is named as:
./Experiments/{System}_{Graph}_{Query}_{QueryType}/run.log
- Generating Tables
To generate CSV tables corresponding to "Fig 5: GraphMini vs State-of-The-Art"
Fig 5a:
cd Experiments && python3 log2csv.py --baseline=Dryadic --target=GraphMini --adjtype=VertexInducedFig 5b:
cd Experiments && python3 log2csv.py --baseline=Dryadic --target=GraphMini --adjtype=EdgeInducedFig 5c:
cd Experiments && python3 log2csv.py --baseline=GraphPi --target=GraphMini --adjtype=EdgeInducedFig 5d:
cd Experiments && python3 log2csv.py --baseline=GraphPi --target=GraphMini --adjtype=EdgeInducedIEPTo generate tables corresponding to "Fig 6: GraphMini vs Base" Fig 6a:
cd Experiments && python3 log2csv.py --baseline=Base --target=GraphMini --adjtype=EdgeInducedFig 6b:
cd Experiments && python3 log2csv.py --baseline=Base --target=GraphMini --adjtype=VertexInducedSee more detail in GraphMini/README.md