You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added all the competing libraries to the list in the README. Added
additional dependencies that are required for running 'make setup'
directly. May not be an exhaustive listing of dependencies. Cross
checking with different types of platforms required.
Changes to be committed:
modified: README.md
Copy file name to clipboardExpand all lines: README.md
+70-14Lines changed: 70 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,22 +12,90 @@ The system has several key attributes that lead to its highly and easily customi
12
12
13
13
14
14
Quick links to this file:
15
+
*[Competing libraries](#competing-libraries)
15
16
*[Prerequisites](#prerequisites)
16
17
*[Running](#running)
17
18
*[Directory Structure](#directory-structure)
18
19
*[Getting the datasets](#getting-the-datasets)
19
20
*[Configuration](#configuration)
20
-
*[Competing libraries](#competing-libraries)
21
21
*[Citation details](#citation-details)
22
22
23
+
## Competing libraries
24
+
25
+
Machine learning toolkits:
26
+
*[mlpack](http://mlpack.org)
27
+
*[Shogun-toolbox](http://shogun-toolbox.org)
28
+
*[scikit-learn](http://scikit-learn.org)
29
+
*[MATLAB](http://mathworks.com)
30
+
*[Weka](http://cs.waikato.ac.nz/ml/weka/)
31
+
*[elki](https://elki-project.github.io/)
32
+
*[mlpy](http://mlpy.sourceforge.net)
33
+
*[dlibml](http://dlib.net/ml.html)
34
+
*[milk](https://github.com/luispedro/milk/)
35
+
*[R](https://www.r-project.org/)
36
+
37
+
Nearest Neighbour Algorithms:
38
+
*[ANN](http://www.cs.umd.edu/~mount/ANN/)
39
+
*[FLANN](http://www.cs.ubc.ca/research/flann/)
40
+
*[nearpy](http://pixelogik.github.io/NearPy/)
41
+
*[annoy](https://github.com/spotify/annoy)
42
+
*[mrpt](https://github.com/vioshyvo/mrpt)
43
+
44
+
Inactive toolkits:
45
+
*[HLearn](https://github.com/mikeizbicki/HLearn)
46
+
NOTE: `HLearn` is not currently being benchmarked by this repository.
47
+
23
48
## Prerequisites
24
49
25
50
***[Python 3.3+](http://www.python.org"Python Website")**: The main benchmark script is written with the programming language python: The benchmark script by default uses the version of Python on your path.
51
+
***[numpy](https://www.numpy.org/)**: Numpy provides a powerful N-dimensional array object and sophisticated (broadcasting) functions useful for handling and transforming data.
26
52
***[Python-yaml](http://pyyaml.org"Python-yaml Website")**: PyYAML is a YAML parser and emitter for Python. We've picked YAML as the configuration file format for specifying the structure for the project.
27
53
***[SQLite](http://www.sqlite.org"SQLite Website")** (**Optional**): SQLite is a lightweight disk-based database that doesn't require a separate server process. We use the python built-in SQLite database to save the benchmark results.
28
-
***[Valgrind](http://valgrind.org"Valgrind Website")** (**Optional**): Valgrind is a suite of tools for debugging and profiling. This package is only needed if you want to run the memory benchmarks.
29
54
***[python-xmlrunner](https://github.com/lamby/pkg-python-xmlrunner"python-xmlrunner github")** (**Optional**): The xmlrunner module is a unittest test runner that can save test results to XML files. This package is only needed if you want to run the tests.
30
55
56
+
### Prerequisites for Setting up Competing Libraries
57
+
All the following pre-requisite packages are needed to be installed before running `make setup` command (see the next [section](#running)):
58
+
**FLANN library:**
59
+
*[hdf5](https://www.hdfgroup.org/solutions/hdf5/): This is a high performance data software library.
60
+
*[gtest](https://github.com/google/googletest): This package used for writing C++ tests.
61
+
62
+
**mlpack:**
63
+
*[Armadillo](http://arma.sourceforge.net/download.html): This package is a c++ library for linear algebra and scientific computing.
64
+
*[Boost C++](https://www.boost.org/): This package is required for compiling mlpack from source.
65
+
66
+
**mlpy:**
67
+
*[scipy](https://www.scipy.org/): Python-based ecosystem of open-source software for mathematics, science, and engineering
68
+
*[GSL](https://www.gnu.org/software/gsl/): The is a numerical library for C and C++ programmers.
69
+
70
+
**scikit-learn:**
71
+
*[scipy](https://www.scipy.org/): Python-based ecosystem of open-source software for mathematics, science, and engineering
72
+
*[joblib](https://joblib.readthedocs.io/): Joblib is a set of tools to provide lightweight pipelining in Python.
73
+
*[Cython-0.25.2](https://cython.org/): C-extentions for python. Required for compiling the scikit from source.
74
+
75
+
**Nearpy:**
76
+
*[scipy](https://www.scipy.org/): Python-based ecosystem of open-source software for mathematics, science, and engineering
77
+
*[redis](https://redislabs.com/lp/python-redis/): The Python interface to the Redis key-value store.
78
+
79
+
**shogun:**
80
+
*[swig](https://github.com/swig/swig): SWIG is a compiler that integrates C and C++ with languages including Perl, Python, Tcl, Ruby, PHP, Java, C#, D, Go, Lua, Octave, R, Scheme (Guile, MzScheme/Racket), Scilab, Ocaml. SWIG can also export its parse tree into XM
81
+
82
+
**weka:**
83
+
*[java](https://www.java.com/en/): Java is a programming language on which weka is based.
84
+
85
+
**elki:**
86
+
*[java](https://www.java.com/en/): Java is a programming language on which weka is based.
87
+
88
+
**milk**
89
+
*[Eigen3](http://eigen.tuxfamily.org/index.php?title=Main_Page): Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
90
+
91
+
**R**
92
+
*[gfortran](https://gcc.gnu.org/wiki/GFortran): A free Fortran 95/2003/2008 compiler for GCC
93
+
*[readline](https://www.gnu.org/software/readline/): This library provides a set of functions for use by applications that allow users to edit command lines as they are typed in.
94
+
*[libbz2-dev](https://www.sourceware.org/bzip2/): This is a freely available, patent free, high-quality data compressor. Header files of this software are required.
95
+
*[liblzma-dev](https://tukaani.org/xz/): XZ Utils is free general-purpose data compression software with a high compression ratio. Header files of this software are required.
96
+
*[libcurl4](https://curl.haxx.se/libcurl/): libcurl is a free and easy-to-use client-side URL transfer library
97
+
98
+
31
99
## Running
32
100
33
101
Benchmarks are run with the `make` command.
@@ -234,18 +302,6 @@ methods:
234
302
235
303
In this case we benchmark the pca method located in methods/mlpack/pca.py with the isolet and the cities dataset. The pca method scales the data before running the pca method. The benchmark performs twice for each dataset. Additionally the pca.py script supports the following file formats txt, csv, hdf5 and bin. If the data isn't available in this particular case the format will be generated.
236
304
237
-
## Competing libraries
238
-
239
-
* http://mlpack.org
240
-
* http://mathworks.com
241
-
* http://shogun-toolbox.org
242
-
* http://cs.waikato.ac.nz/ml/weka/
243
-
* https://elki-project.github.io/
244
-
* http://scikit-learn.org
245
-
* http://mlpy.sourceforge.net
246
-
* http://www.cs.umd.edu/~mount/ANN/
247
-
* http://www.cs.ubc.ca/research/flann/
248
-
249
305
## Citation details
250
306
251
307
If you use the benchmarks in your work, we'd really appreciate it if you could cite the following paper (given in BiBTeX format):
0 commit comments