In this project report, we have shown 3 algorithms for executing multi-dimensional joins as well as an in-depth survey and exploration of a low-dimensional spatial join. The initial algorithm, the Iterative Spatial Join, performs a spatial join on low-dimensional data and it is founded on a plane-sweep approach. The Iterative Spatial Join functions well when internal memory is minimal, in comparison to competing techniques. This indicates that the Iterative Spatial Join will be a good choice for very large data sets or in cases where internal memory is a shared resource and is therefore modest, for example today’s database engines that usually share internal memory between a number of queries. In addition, the performance of the Iterative Spatial Join is predictable and has no parameters that require to be tuned, as opposed to other algorithms. The next algorithm, the Quickjoin algorithm, performs a higher-dimensional similarity join in which pairs of objects which lie within a certain distance …
Watch a video on Spatial Join
Contents
1 Introduction
1.1 Spatial Joins On Unindexed Datasets
1.2 Similarity Joins
1.3 The Hausdorff Distance
2 Survey of Spatial Joins
2.1 Spatial Join Basics
2.1.1 Design Considerations and Parameters
2.1.2 Minimum Bounding Rectangles and Approximations
2.1.3 Linear Orderings
2.2 The Filtering Stage – Internal Memory
2.2.1 Nested-Loop Joins
2.2.2 Indexed Nested-Loop Join
2.2.3 Plane Sweep
2.2.4 Z-Order Methods
2.3 The Filtering Stage – External Memory
2.3.1 Both Datasets Indexed
2.3.1.1 Hierarchical Traversal
2.3.1.2 Non-Hierarchical Methods
2.3.1.3 Transform to Multi-Dimensional Points
2.3.1.4 Node to Node Comparison
2.3.2 One Dataset Not Indexed
2.3.2.1 Constructing a Second Index
2.3.2.2 An Index as Partitioned Data
2.3.2.3 An Index as Sorted Data
2.3.3 Neither Dataset Indexed
2.3.3.1 Basic Partitioning Algorithm
2.3.3.2 Determining the Number of Partitions
2.3.3.3 Repartitioning
2.3.3.4 Avoiding Duplicate Results
2.3.4 Partitioning Using Grids
2.3.5 Partitioning with Strips
2.3.6 Partitioning by Size
2.3.7 Data-Centric Partitioning
2.4 Filtering Extensions
2.4.1 False Hit Filtering
2.4.2 True Hit Filtering
2.4.3 Non-Blocking Filtering
2.5 The Refinement Stage
2.5.1 Ordering Pairs
2.5.2 Polygon Intersection Test
2.5.3 An Alternate Intersection Test
2.6 Specialized Spatial Joins
2.6.1 Multiway Joins
2.6.1.1 Multiway Indexed Nested-Loop Spatial Joins
2.6.1.2 Multiway Hierarchical Traversal
2.6.2 Parallel
2.6.3 Distributed
2.6.4 Other Techniques
2.7 Selectivity Estimation
3 Iterative Spatial Join
3.1 Internal Memory Plane Sweep
3.1.1 Description
3.1.2 Analysis
3.2 Iterative Spatial Join
3.2.1 Description
3.2.2 Analysis
3.3 Experimental Results
3.3.1 Experimental Setup and Implementation
3.3.2 The Data
3.3.3 The Results
3.3.3.1 Real Data
3.3.3.2 Experiments Varying Available Internal Memory
3.3.3.3 Experiments Varying the Input Size
3.3.3.4 Algorithm Overhead
4 Quickjoin
4.1 Related Work
4.1.1 Epsilon Grid Order (EGO)
4.1.2 GESS
4.2 The Quickjoin Algorithm
4.2.1 Quickjoin Using Ball Partitioning
4.2.2 Quickjoin Using Generalized Hyperplane Partitioning
4.2.3 Dimensional Version – Vector Data
4.2.4 External Memory Algorithm Using Multiple Pivots (Partitions)
4.2.5 Split Windows
4.3 Analysis
4.3.1 CPU Performance Analysis…..
Source: UMD