We have moved! Please visit us at ANTHROECOLOGY.ORG. This website is for archival purposes only.


Feb 29 2012

64-bit Python Computing

Recently I ran into a memory problem running large point cloud arrays through Python and Numpy.  I quickly determined that I was asking Numpy to work on massive arrays that were exceeding the limits of the 32-bit Python process in Windows.  I came up with a workaround whereby I truncate the UTM X-Y coordinate information so I can store the numbers as 32-bit floating point values without losing precision, then add the extra numbers back at the end.  Basically, I translated the X-Y coordinates (352845.49 4346713.91 --> 2845.49 6713.91) then back again to the original UTM values after computation.  This was OK, but I wanted to overcome the 32-bit limit in Python.

There are unofficial 64-bit builds available here, but I wanted to try something established.  I got one of our machines running dual-boot Windows 7 / Ubuntu 11.10 64-bit and compiled all the Python, Scipy, and Numpy libs in Ubuntu.  These builds are inherently 64-bit because of the 64-bit OS install, so there are no issues with addressing large arrays of data.  Back to work!

Dec 26 2011

Ecogeo versus spline codes

There was one last thing that I did for the error analysis. 

Going through the raw ply data set from Herbert Run Spring 2010 in an arbitrary coordinate system, I picked out the locations of 5 buckets that were in the shape of an X on campus:
100, 102, 108, 111, 114.

Using ScanView like before, I was able to pick out each location for these buckets by individually chosing points within the area where a bucket should be that appeared to be a part of a clump of orange. I took the average of each x,y,z coordinate for each set of points to obtain an approximate center of where the buckets should be located in the arbitrary coordinate system generated when the point cloud was made. I then paired these coordinates with the referenced locations of where each specific bucket is located in GPS coordinates. 

This data was used by a different python code ecogeo4.py, which is also a way of getting the 7 Helmert parameters needed to transform arbitrary point cloud into the correct GPS coordinate system. This code takes one parameter text file which should be in the following format:

arbitraryX arbitraryY arbitraryZ realX realY realZ,

one point per row, seperated by spaces not tabs.  

Using the 5 buckets mentioned before, I ran the ecogeo code to obtain a new set of helmert parameters. I then used the applyHelmert python code to transform a list of the locations of the buckets in the raw point cloud, consisting of just 14 points.

This yielded data similar to the process of using the spline.py code. The z direction is still inverted, which is the coordinate that most of the error is coming from. The x and y directions are very good.
For the tranformed x values verses the expected x values, the trend line y = 1.0008x - 265.39, with an R2 of 0.9998.

For the y values, y = 0.9998x + 3372.5, also with an R2 of 0.9998.

The z coordinates are odd with a trend line of y = -0.2557x + 68.562 and an R2 of 0.1563, which is really bad not only because the data is inverted, but it seems to be quite unrelated. 

This data resulted in root mean square errors of distances between actual bucket locations and predicted bucket locations of 2.354 in the XY plane, 9.045 in the Z direction and an overall error of 9.346.

The result I recieved with the spline code had RMSE errors of 4.198 for XY, 95.167 for Z and 95.299 overall. Obviously the spline code does a much worse job converting the data in the z direction than this ecogeo code does, but in the xy plane, the errors aren't too far off.

Overall, the spline code seems to work almost as well as the ecogeo code did with this small data set in the x and y directions, but there is still the confusion with the z direction due to inversion.

Nov 30 2011

Georeferencing Code Updates

Continuing from my last post, I did the same analysis on the Herbert Run point cloud that was generated from spring 2011. It turns out at first, the set of GPS data was not ordered properly, so the spline function didn't work correctly. This yielded the following results:

The x-y-z axes show how the orientation of the data is set up. Ideally, this picture would show an untilted image as if one were looking down on the campus perpendicularly. This point cloud was given an incorrect set of helmert parameters, due to having a poorly constructed spline of the GPS and camera data. This problem was fixed and once I analyzed the data again, I got much better results.

 

 This point cloud transformation was much better, now that the GPS points were in the correct order. The x and y axes appear to be close enough to where they should be and it seems that we are perpendicularly looking down onto campus, but there is one glitch that this picture does not show. All of the z coordinates appear to have been inverted. The high points in the point cloud are actually the low points, and the low points in the cloud are the real high points. This is indicated in the analysis of the orange field bucket position in the point cloud versus their actual position in space when the pictures were taken. 

These scatter plots are for this second attempt of transforming the point cloud. The graph is of the X-values of the manually detected buckets in the point cloud, versus the actual GPS coordinates of those buckets in the field. The equation of the trend line for the x coordinates is y=0.996x + 1398.7 with an R-squared = 0.9995. The graph of the y-values of the data is not shown, but is very similar to the first graph, with the trend line for the y values for the buckets being y=1.0073x - 31820 with an R-squared = 0.9994. The graphs  of x and y show a strong correlation between the two data sets for each. Both slopes are very close to 1. 

The second graph shown is for the values of the estimated z coordinates of the buckets versus the GPS z coordinates. You can see a correlation between the two by the trend line, but the slope is negative. The trend line is y = -1.0884x +187.29 and R-squared = 0.9872. This negative slope seems to be tied to the fact that all of the point cloud data had inverted z coordinate values. 
Overall, this data is much, much better than the original result. We are currently trying to find a solution to the inverted z-axis, but the following is the first attempt to fix this problem.

When the helmert parameters were compared to the original data set from Herbert Run in Fall 2010, the fourth parameter which was for scaling turned out to be negative for the spring. We wanted to see how the transformed point cloud would react if we forced the scaling constant to be greater than zero. This change results in the following point cloud orientation:

This did exactly what we wanted for the z-axis, all the real world high points became point cloud high points and lows becames lows. The obvious problem here is that it inverted the x and y axes. This "solution" really did not solve much due to the fact that it caused the problem it was attempting to fix in different axes. The correlation between the 3 sets of variables only changed by making the slopes of the trend lines of opposite sign to what they were before. The R-squared values did not change when the scale parameter was altered. Besides this, despite having the z axis in the correct orientation, the data seems a little wierd. The z coordinates were falling in a range of about (-3,7). I took the differences between the real GPS height of the buckets and the calculated heights of the buckets and it looks like there is a consistent difference between the two. The calculated data is about 50.7 units below that of the expected GPS heights, for each bucket. 
I want to see how just altering the applyHelmert code to multiply anything involving the z-axis by the absolute value of the scale parameter and leaving the x and y axes multiplications alone will do. If we can maintain the x,y axes from the first attempt with ordered data, and use the z-axis orientation with ordered data and only being multiplied by the absolute value of the scale parameter for the z-components, the point cloud should be oriented in the correct way, just translated down too low by a constant amount. (Which is something that has not been explained yet.)

Nov 22 2011

Analyzing the Point Cloud Transformations

This graph represents the data for the Herbert Run site from October 11, 2010. I used ScanView to locate the exact coordinates of the orange buckets in the transformed point cloud that was created with the previously written helmert code. The values on the X-Axis represent the actual GPS values from the georeferencing in the x direction, where higher values are more western, I believe. The values on the Y-Axis correspond to the calculated mean of the orange points I extracted with ScanView. The black line is the line of best fit of the data and has a slope of 0.9941, which is quite close to 1. A slope of 1 would indicate an exact correlation between the two data sets. This is good in two ways: the slope is actually positive, so there's a positive correlation between the two data sets, and the slope is very close to 1, which means the correlation is strong. The graph for the Y values is very similar, with a positive slope of 1.0079. What's really good about this is the results I got before I did this analysis, with the point cloud of a different data set.

This is for the knoll site from fall 2010. There is a negative correlation, and the slope is no where close to 1, so this mean the transformation of this particular point cloud did not turn out well at all. It's possible that I made a mistake running the spline.py code to get the 7 Helmert parameters. The 4th parameter which is for scaling was negative which doesn't seem nice, but it looked like the data wasn't rotated enough either. I still have another data set to test out, and once that is done I'm going to retry this data set to see if it was just a mistake I made.

A small note about the bucket search based on colors, some of the buckets were on top of blue boxes which seemed to be altering the color of the orange points, they looked pretty pink which was not a color I was searching for. This could be a reason why some of the buckets were not registering in my search. Plus Jonathan pointed out that some of the trees were starting to change colors at this point, so that could be a small source of some of the extraneous points.

Apr 07 2011

Open Source Terrain Processing

I am very excited by the current prospects of incorporating free, open-source terrain processing algorithms into our workflow.  While we are ultimately interested in studying the trees in our 3D scans, it is necessary to automatically derive a digital terrain model (DTM) that represents the ground below the canopy for the purpose of estimating tree height.

A recent paper in the open-source journal Remote Sensing, describes several freely available algorithms for terrain processing.  I am in the process of converting the entire ArcGIS workflow we used in our first paper into an automated Python workflow, and am excited about the prospect of incorporating other open-source algorithms into the mix.  Currently, by working with Numpy in Python, my processing code takes a input Ecosynth point cloud and applies two levels of ‘global’ and ‘local’ statistical filtering to remove outlier and noise elevation points in about a minute for 500,000 points.  This had previously taken hours with ArcGIS, but by formatting the data into arrays, Numpy effortlessly screams through all the points in no time. 

I am going to focus on two pieces of software.  One is the Multiscale Curvature Classification algorithm (MCC-LIDAR) by Evans and Hudak, at sourceforge here, that was mentioned in the recent paper in Remote Sensing.  The other is the libLAS module for Python, included with OSGeo, that can be used to read and write to the industry standard LAS data format for working with LiDAR data. Fun, fun!  This of course if going on in the meantime while I try to get my proposal finished.

Refs: 

Dandois, J.P.; Ellis, E.C. Remote Sensing of Vegetation Structure Using Computer Vision. Remote Sens. 2010, 2, 1157-1176.

Tinkham, W.T.; Huang, H.; Smith, A.M.S.; Shrestha, R.; Falkowski, M.J.; Hudak, A.T.; Link, T.E.; Glenn, N.F.; Marks, D.G. A Comparison of Two Open Source LiDAR Surface Classification Algorithms. Remote Sens. 2011, 3, 638-649.

Mar 14 2011

Getting Friendly With Python

We made great process today on our Python coordinate transformation algorithm. We are using scipy and the optimization module to find the best approximation for our transformation parameters, and are almost done finding a solution. Much thanks to Jonathan for being super helpful!

Mar 02 2011

From R to Python…

My journey to write a coordinate transformation program in R has taken a turn in the direction of Python for its ability to integrate with various software products. R uses matrix implementation, so reading in files and manipulating data columns to create the coordinate transformations was much easier than working with data in the more general language of Python. With Python, even reading in a data table required much syntax research and subsequent list manipulation. Hopefully this journey will get easier as I continue on this Python road.

Feb 11 2011

Rising Popularity of the R Programming Language

According to a recent analysis of the search hit popularity of the top 100 programming languages, the R Statistical Computing language, has surpassed both MATLAB and SAS.

I first read about this from the Revolutions blog, a blog dedicated to posting news and content about R, and was happy to see from the survey report charts that the free R software has such relatively high popularity compared to similar languages.  It is worth noting here that the popularity difference is slight due to the fact that this survey counts many languages that are more popular than either R, MATLAB, or SAS. R (#25) had a popularity of 0.561%, MATLAB (#29) 0.483%, and SAS (#30) 0.474%.  Meanwhile Python (#4) has a popularity of about 7%, C (#2) about 15% and Java at #1 with about 18.5%.  The Revolutions blog also makes the important point that the methods used to compute these stats may be a bit controversial, but the stats still serve a purpose.

I first learned R from taking a graduate level statistics course at UMBC, Environmental Statistics 614, and have developed my skills with the programming language to help with data analysis and preparing graphs and figures for papers.  I used R to perform the data analysis and generate the non-map figures for our first paper on Ecosynth and will continue to do so for future publications.

I have only used MATLAB to execute a camera calibration program for my Computational Photography class last semester and I learned a bit of SAS programming for my Multivariate Statistics course last year.  I think both have their uses, but I am really fond of the relatively light-weight size and 'cost' of R.  I am also interested in adding in the scientific and numerical programming functions of Python, SciPy and NumPy.  The SAGE project utilizes SciPy and NumPy to establish a robust free and open-source alternative to for-pay analytical tools like MATLAB, and is also increasing in popularity.  

Free open-source revolution!  This makes me want to put up a post about open-source GIS software...

Oct 19 2010

Flight Telemetry Data and Monitoring the Flight

We recently learned about the robust telemetry data that the Flight Control board creates when a MicroSD card is plugged into the onboard slot.  Since then we have always flown with the card in to capture the telemetry data and it is proving to be an invaluable part of the post-flight assessment process.

The next steps will require a *simple* GIS workflow to evaluate whether the actually flown path sufficiently matches the pre-planned path to provide the necessary image sidelap and endlap for 3D reconstruction.  

We have already seen first hand how deviations in the fight plan tracks result in gaps in the vision reconstruction and also how mis-aligned platform orientation results in gaps as well.  At a flight at SERC the platform and camera where oriented 90 degrees from the path of travel, resulting in no sidelap between parallel tracks, link to Photosynth.  At a recent flight in New Jersey the wind blew the little guy around a lot and even though the camera orientation was spot on, the width between some parallel tracks exceeded our flight plan for achieving side lap, again resulting in large scan gaps, link to Photosynth.

So what does this mean?  It is necessary not only to have spot-on preparation of camera orientation and flight planning prior to flight, but it will be necessary to run diagnostics in the field to evaluate whether the route flown was sufficient for 3D reconstruction.  I wrote a simple python script to translate the Mikrokopter GPX telemetry data into a text file for use in a spreadsheet or GIS program.  By doing this we can look at characteristics of the flight system through time (e.g., battery voltage vs. flight time) or space (e.g., Navi-Ctl status messages at each GPS point along the flight path).  The Python script that translates Mikrokopter GPX telemetry data to a text file is located on our Coding Corner page.