# Cartography

### Iterative Douglas-PeuckerPolyline Reduction for C

by Carlos Portela

A variation of the Douglas-Peucker polyline reduction algorithm is described, using the C programming language, to reduce the number of points of a cartographic polyline.

Iteration is used instead of recursion, stacking points in a simple dynamic memory list. Distances from points to a segment of a virtual line are calculated, instead of measuring distances to the entire virtual line. Free source code is provided in the PDF version of this article:

##### Map Reduction

Cartographic polyline reduction is part of the process of map generalization, when a map scale needs to be reduced.

A polyline is a sequence of line segments, with each line segment starting where the preceding line segment ended. By forming a polyline, instead of a series of individual line segments, less storage space is required: the polyline stores the coordinates of the first vertex (point) of the polyline, and one vertex for each line segment, instead of storing two vertices for each line segment.

Cartographic data may be used at different resolutions, with lower resolution uses requiring less polyline points to avoid a cluttered look and unnecessary processing. Cartography text books describe performing polyline reduction using the nth point algorithm, and using the Douglas-Peucker algorithm.

The nth point algorithm consists of skipping every n points. That is quick and easy, but misses corners and other important details.

The Douglas-Peucker algorithm has a more sophisticated way of skipping points. It visualizes long lines (virtual lines), and discards points that are close to the virtual lines. That is more calculation-intensive than the nth point algorithm, but provides much better results.

##### Douglas-Peucker algorithm

In 1973, David H. Douglas and Thomas K. Peucker wrote an article titled “Algorithms for the reduction of the number of points required to represent a digitized line or its caricature”, explaining how they developed an algorithm for reducing the number of points in a polyline, to display the polyline at a lower resolution.

Douglas and Peucker developed iterative and recursive versions of their algorithm, in the Fortran and Algol programming languages respectively, on an IBM 370 mainframe computer. Their algorithm measures point distances to a virtual line (not a virtual line segment), except for closed polylines (with coincident first and last points) whereupon distances are measured to the first/last point of the polyline. For open polylines, we propose measuring point distances to a virtual line segment instead of a virtual line.

##### Shortest Distance(to a Line Segment)

This section discusses using linear algebra to calculate the shortest distance from a point to a line segment.

A vector, for this discussion, will be an ordered list of numbers. The following are examples of vectors:

( 1, 2, 3 )     ( 3, 0, 4, 1 )     ( 2, 3 )

The first vector has 3 components; the second and third vectors have 4 and 2 components respectively.

A vector’s number of components represents the number of dimensions its vector space has. In this article, we use vectors that have two components, representing points and lines in 2 dimensions.

The examples in this article may be extended to 3 dimensions, by using vectors with 3 components.

The length of a vector is called its magnitude or norm. This length is a scalar. (A scalar is a single number, not a vector). It can be calculated using the Pythagorean Theorem. For example, consider a point P represented by the vector ( 2, 3 ): Figure 1

The line from the Origin to P can be considered the hypotenuse of a right triangle, with the sides of the triangle being 2 and 3. According to the Pythagorean Theorem, the length of the hypotenuse will equal the square root of the sum of the squares of the sides:

Distance 0,0 to P
=  $\sqrt{2×2 + 3×3}$

A unit vector is a vector with a length of 1. To convert any vector to a unit vector, simply divide each component of the vector with the vector length. This is called normalizing a vector.

The unit vector points in the same direction as the vector it represents, but has a length of 1 no matter what the original vector's length is. For example, the unit vector of ( 2, 3 ) is approximately ( 0.5547, 0.83205 ).

Next we define the Inner Product, which is also called the Dot Product (both terms are used interchangably).

The Inner Product is an operation that is performed on two vectors that have the same number of components (dimensions), and the result is a scalar (a single number, not a vector). This resulting number (the dot product) is the sum of the products of the corresponding components. For example, the dot product of ( 2, 3 ) and ( 5, 6 ) is:

(  2, 3  ) · (  5, 6  )  =  2 × 5 + 3 × 6  =  28

Notice that the symbol for an inner product is a dot ( · ), not the multiplication product sign ( × ).

It is convenient to use labels and symbols to refer to vectors. The vectors ( 2, 3 ) and ( 5, 6 ) can be called u and v respectively:

u  =  ( 2, 3 )

v  =  ( 5, 6 )

The length of a vector is denoted by its name (u or v in this example) enclosed in vertical bars:

||u||  =  $\sqrt{2×2 + 3×3}$  =  $\sqrt{13}$  =  3.6055513

||v||  =  $\sqrt{5×5 + 6×6}$  =  $\sqrt{61}$  =  7.81

The unit vector is a vector divided by its length:

$\text{unit vector of u =} \frac{ \text{u} }{ \text{||u||} }$

The inner product of vectors has several interesting properties. Of particular interest for our task is that the inner product can be used to calculate the projection of one vector onto another, which in turn can be used to calculate the shortest distance from the end of one vector (a point) to a line passing through the other vector.

Consider projecting a vector onto the line of another vector. We'll call the first vector u and the second vector v. Of special interest is the scalar (single number) that represents how far along the line of the vector v is the perpendicular projection of vector u onto the line of v (the distance from 0,0 to point A): Figure 2

Conveniently, that number is the inner product of vector u and the unit vector of v:

$\text{distance from 0,0 to point A = u · } \frac{ \text{v} }{ \text{||v||} }$

After calculating that number, the Pythagorean Theorem can be used to determine the distance from point A to the end of vector u, which is usually the shortest distance from the end of u to the line segment represented by vector v.

However, if the angle between the vectors u and v is greater than 90, the perpendicular projection will not land on the line segment represented by the vector v, but will instead land on a line that is collinear with v but extends past the vector’s endpoint.

In that case, the projection scalar is negative instead of positive, and the shortest distance from the end of u to the line segment represented by v is the length of u: Figure 3

It is very convenient to have the projection scalar be negative in that case. The software simply tests for the projection scalar being negative, and if it is negative then use the length of the vector u as the shortest distance to the line segment. Otherwise, calculate the perpendicular projection distance using the Pythagorean Theorem, and use that as the shortest distance to the line segment.

Finally, there is one last consideration. If the length of u is longer than the length of v and the angle between them is small enough, the perpendicular projection of u onto v may not intersect v, but instead intersect the line that is collinear with v extending past v: Figure 4

In that case, the shortest distance to the line segment is not the perpendicular projection. To detect that possibility, we must repeat this process, but with the v vector reversed: Figure 5

That only happens if the first set of vectors produced a positive (not negative) projection scalar. After reconstructing both vectors, proceed as before with the new vectors. Then if the projection scalar is negative, use the length of this u as the shortest distance from the end point of u to the line segment of v. Otherwise (if the second projection is not negative) use the perpendicular projection distance as the shortest distance.

##### Iterative Implementation

Our iterative implementation consists of two loops, one nested inside the other. The outer loop iterates while there are end points remaining. Douglas and Peucker refer to the end points of the outer loop as an anchor and a floater.

The first iteration of the outer loop has the polyline end points as the outer loop end points (as anchor and floater).

The two outer loop end points define a vector (virtual line segment).

The inner loop traverses the entire polyline between the two outer loop end points, one intervening vertex (point) at a time. For each intervening vertex, the shortest distance between the point and the virtual line segment is calculated. The longest of those distances is recorded as the maximum distance of the intervening points.

If the maximum distance is less than a predefined tolerance, the two outer loop end points are used as the line (the virtual line segment becomes a line segment to use).

Otherwise, one or more intermediate points between the end points are further than the tolerance distance. The furthest point is chosen as a new end point, forming two new virtual line segments to process (two more iterations of the outer loop), by storing (pushing) the new point with one of the previous end points on a dynamically allocated stack, and pushing the new point with the other previous end point on the stack.

The outer loop iterates while there are any point pairs on the stack, popping a pair of points off the stack each iteration.

A possible optimization could be to avoid a comparison square root by comparing to maximum distance squared. Another possible optimization could be to use the distance-to-point (or its square) if the first projection is orthogonal (instead of performing a second dot product in that case). And eliminating the second projection may be possible by comparing line distances of the first projection.

This implementation does not delete polyline points. Points are simply marked as suitable for being used. Ignoring such markings lets you access the original polyline. Or if you want to use the reduced polyline, then use only the marked points.

The included source code assumes the pnUseFlag integer array is zeroed out before invoking the point reduction function. After the function returns, that array can be examined to determine the array indices of the vertices that are to be in the reduced polyline.

The pnUseFlag array can then be used to create other data structures, such as a level-of-detail number for each vertex, as Paul B. Anderson did with MWDBPOLY.

To review what MWDBPOLY was, here is an excerpt from its documentation:

“MWDB-POLY was the mapping database used in a Delphi Component called TWorldMap. In 1996 NASA used software that incorporated the TWorldMap component onboard Atlantis and the Russian space station Mir while in orbit, to help identify and capture photographs of the Earth.”

As the MWDBPOLY documentation states, its polylines were provided with 5 levels of detail:

“Detail level 5 produces the least detailed graphics image. The points at each level of detail are additive to the points at all lower levels. For example, when using detail level 4 the points from both levels 4 and 5 must be used/retrieved”

To develop such a database, you could use multiple invocations of the ReducePoints() function, with a different tolerance for each invocation.