ANN_TEST\ Compute Approximate Nearest Neighbors {#ann_test-compute-approximate-nearest-neighbors align=”center”} =====================================
ANN_TEST is a C++ program which computes approximate nearest neighbors when the data and query points are available in files.
ANN_TEST relies on the ANN library to quickly and efficiently compute the information. The brute force method simply computes, for each query point, the distance to every data point, and returns the data point whose distance is least.
This approach might seem to be the fastest possible, but it is not. Particularly when there are many data points or query points, it is worth the overhead of computing a data structure that allows nearest neighbor queries to be computed quickly and efficiently.
The greatest efficiency occurs if the user is willing to accept approximate nearest neighbors. That is, the program will return a data point that is either the nearest, or “almost” the nearest, to within some tolerance. In many applications, the controllable error of this approximation is worth the marked speedup in execution.
ANN_TEST is a driver for testing and evaluating the ANN library for computing approximate nearest neighbors. It allows the user to generate data and query sets of various sizes, dimensions, and distributions, to build KD and BD trees of various types, run queries and output performance statistics.
Directives consist of a directive name, followed by list of arguments. Arguments and directives are separated by white space (blank, tab, and newline). String arguments are not quoted, and consist of a string of nonwhite chacters. A character “#” denotes a comment. The following characters up to the end of line are ignored. Comments may only be inserted between directives (not within the argument list of a directive).
ANN_TEST can perform the following operations. How these operations are performed depends on the options which are described later:
Data Generation:
Building the tree
Query Generation/Searching:
Miscellaneous:
How these operations are performed depends on a set of options. If an option is not specified, a default value is used. An option retains its value until it is set again. String inputs are not enclosed in quotes, and must contain no embedded white space (sorry, this is C++’s convention).
The test program can perform the following operations. How these operations are performed depends on the options which are described later:
output_label test_run_0 # Label this run "test_run_0".
validate off # Do not perform validation.
dim 16 # The points are in dimension 16.
stats query_stats # Output performance statistics for queries
seed 121212 # Set the random number seed.
data_size 1000
distribution uniform
gen_data_pts # Generate 1000 uniform data points in dim 16.
query_size 100
std_dev 0.05
distribution clus_gauss
gen_query_pts # Generate 100 query points
# in 10 clusters with std_dev 0.05
bucket_size 2
split_rule kd
shrink_rule none
build_ann # Set up a KD tree with bucket size 2.
epsilon 0.1 # We accept 10% error.
near_neigh 5 # We ask for the 5 nearest neighbors;
max_pts_visit 100 # Stop search if more than 100 points seen
run_queries standard # Run the queries;
data_size 500
read_data_pts data.in # Read up to 500 data points from file data.in
split_rule sl_midpt # Use the sliding midpoint shrink.
shrink_rule simple # Use a simple shrink.
build_ann # Set up a BD tree. midpoint split
epsilon 0 # We accept 0% error.
run_queries priority # Run the queries.
ANN_TEST is available in a C++ version.
ANN, a C++ library which creates and manipulates the data structures necessary to carry out the approximate nearest neighbor computations.
ANN_TO_FIG, a C++ program which can convert the record of a search carried out by ANN_TEST into a graphical image in the FIG format.
David Mount, Sunil Arya.
TEST1:
TEST2:
TEST3:
You can go up one level to the C++ source codes.
Last revised on 15 May 2006.