Measuring and Comparing Ellipses

Antonio Zamora

This presentation evaluates methods of comparing ellipses that are fitted to geological features, such as thermokarst lakes and Carolina Bays. The presentation introduces an approach for comparing the goodness of fit of ellipses measured using different scales.

An article published in 2011 about the thermokarst lakes of the Tibet Plateau said that approximately 56% of the Tibetan lakes are elliptical and 23% are elongated. I digitized two of the lakes and fitted them with ellipses by the least squares method. In both cases, many points plotted along the perimeter of the lake are inside or outside of the elliptical curve and only a few points are on the path of the ellipse. The program to fit ellipses calculates the fitting error, which indicates the difference between the elliptical curve and the plotted points, but it needs to be modified to provide a measure that expresses how close the points are to the elliptical curve.

A normal distribution is a symmetrical, bell-shaped curve where most values cluster around the mean and the frequency of values tapers off toward both tails. The standard deviation, represented by the lower case Greek letter sigma, is a measure of how widely data values are spread around the arithmetic mean. A small standard deviation indicates that the data points are clustered closely together, while a large standard deviation shows that the data is more widely dispersed. Standard deviation is calculated by taking the square root of the variance, which is the average of the squared differences from the mean. Standard deviation is often illustrated with a bell-shaped curve, where the mean is the central peak of the curve, and divisions along the horizontal axis indicate multiples of the standard deviation.

The Standard Deviation is calculated as follows. Find the mean by adding all the data points and divide by the number of points. Calculate deviations by subtracting the mean from each data point. Square each deviation and add all the squared differences. This result is called the residual sum of squares. Divide the sum of squares by the number of data points to calculate the variance, which is like the Mean Squared Error. Finally, get the square root of the variance to obtain the standard deviation, which is often represented by the lower case Greek letter sigma. But there is a problem. All these results change when the scale changes. If you measure in meters, you get different results than when you measure in centimeters.

The scaling problem of the standard deviation can be illustrated by looking at identical plots where the coordinates of the right plot have been scaled to be one hundred times smaller than the coordinates of the left plot. The fitting error remains invariant because it has been normalized relative to the semiminor axis of the ellipse. However, the standard deviation for the plot with the smaller coordinates is only 2.3, which could be interpreted as a very good fit, but it is actually just as bad as the fit for the plot on the left with a standard deviation of 232.4.

normalization relative to the semiminor axis of the ellipse

The fitting error was the same for both plots because the error distances are normalized relative to the semiminor axis of the ellipse. Standard measures of fitness, like the Mean Squared Error do not provide meaningful comparisons for ellipses of different sizes. It is necessary to define an error measure that is independent of the number of sample points and the size of the ellipse. This can be achieved by dividing the average error by the semiminor axis of the ellipse and expressing the result as a percentage. Dividing by the semiminor axis basically scales the average error to a proportion of the ellipse that allows comparisons of fitting errors for ellipses of different sizes.

This slide introduces a scaling factor calculated by dividing one thousand by the semiminor axis. One thousand is an arbitrary number in the same way that one hundred is used for the computation of percentages. Each deviation is multiplied by the scaling factor before squaring. This type of scaling is possible for ellipses because the semimajor axis is an intrinsic property of an ellipse, analogous to the radius of a circle. The resulting scaled mean squared error and scaled deviation are the same for both examples in spite of the scaling differences in the data points. This provides a method for comparing the goodness of fit of ellipses obtained by different measuring methods.

An important difference between the standard deviation and the scaled deviation is that the standard deviation is calculated based on the difference of the sample points from the mean, whereas the scaled deviation is calculated based on the difference between the sample points and the corresponding coordinates of the points on the ellipse, multiplied by the scaling factor.

The scaled deviation is calculated like the standard deviation, but with the error distances scaled to an ellipse with a semimajor axis of 1000. This makes it possible to compare the goodness-of-fit of the ellipse and the precision of the samples points using a common frame of reference. This particular Tibetan lake has a fitting error of 12.12 percent. Herndon Bay, in North Carolina, has a fitting error of 1.0 percent. Just from looking at the fit of the ellipses through the points, we can see that the geometry of Herndon Bay matches the ellipse almost perfectly. The scaled deviation of the Tibetan Lake is 128.5 and the scaled deviation of Herndon Bay is 11.5. The fit of the points for Herndon Bay is approximately 11 times more precise than the points for the Tibetan Lake.

These are some Carolina Bays that I processed previously to calculate the fitting error. I processed them again to calculate the Scaled Deviation. In general, the scaled deviation increases as the fitting error increases. For Carolina Bays, the fitting error is usually less than 3 percent. The scaled deviations for the Carolina Bays are relatively small because the sample points fall very close to the elliptical curves.

Scaled Deviation of Thermokarst Lakes in Alaska

Fitting ellipses to the thermokarst lakes in Alaska gives huge fitting errors that far exceed the fitting errors for the Carolina Bays. In addition, the Scaled deviation is very high for these lakes, which means that the sample points along the perimeter of the lake fall very far from the elliptical curve.

In a side-by-side comparison, the Scaled Deviation of the points used to fit ellipses to the Carolina Bays are about 10 times smaller than the Scaled Deviations for thermokarst lakes in Alaska or for the thermokarst lakes in the Tibetan Plateau. These large differences in geological morphology should be sufficient to convince geologists that the Carolina Bays are very different from geological structures originating from ice melt processes or wind and water mechanisms.

Based on the satisfactory outcome of these test cases, I have updated the Python program to fit ellipses to the Carolina Bays to include the calculation of the Scaled Deviation that improves the comparison of goodness-of-fit by normalizing the semiminor axis of the ellipse. The program is open source available from GitHub. Try it out and let me know how it can be improved.

https://github.com/citpeks/Carolina-Bays-least-squares-ellipse-fitting

Play on

Home