The shortest summary I've been able to create works as follows [extreme draft, but comments are still welcome]:
The modelling section of the application consists of three subparts. These are called Pre-modelling, Parameter Value Generation and Post-modelling. In short, the system lets a user create models from his or her perception of the object of interest, followed by the step where images are used to find distances between different coordinates in the object and finally the user goes into the post-modelling part of the system, to correct possible mistakes made in the previous parts of the process.
Here, a user can specify how a building, or object (hereafter all specified as buildings), is put together. This is done by fitting different ‘blocks’ together, where a block could for example be a cube to model the base of the building or a pyramid for the roof. In this part, a user can specify parameter constraints, both in the same block and between different block. For example, the height is often the same at the four corners of a house, while the roof is aligned in all four directions of a square building, and also has its bottom at the top of the base.
The camera can be moved to fit background images, or the user can simply model on-the-fly.
Generating World Coordinate Values
In this part of the process, the user specifies 2D image coordinates – most often in photos – which correspond to 3D world coordinates for the final model. This is the most work intensive part of the process, and consists of three steps:
1) Find the Fundamental matrix (F) between images. By clicking on at least seven corresponding points the minimal solution, RANSAC or the eight-point algorithm can be used. Together with the cameras intrinsic parameters K and K’ the Essential matrix (E) can then be found. Through SVD the camera rotation and translation are derived.
2) For each point correspondence compute/triangulate the corresponding 3D space coordinate X that project to those image points. Initialize structure from two views. For each new image/image pair:
a. Determine pose
b. Refine current values (see more on point 3)
c. Extend structure
3) Weigh points depending on angle compared to camera – Better angle (width etc.) when straight forward than at narrow angle. All surfaces are two dimensional, and should be evaluated as such before moving to the third dimension. Pick corners in post-modelling stage by putting boxes (primitives) at each vertex. When vertices are the same for multiple surfaces, only use one box (which can be picked to change values).
4) Move coordinates to set one model corner at world origin (must stand on ground, with one corner at (0,0,0)). Show with a ground plane, and let user change coordinates. This way, we’ll align this corner with the tracked points from a video stream, where one corner should be set to (0,0,0) and the width/length of the tracked square should be set to the width/length of the modelled building.
5) Save mesh: Save the model as x-file, move texture images to assigned folder together with ".x"-files
With all these parameters, the application calculates 3D world coordinates depending on which coordinates have been specified from images, which different parameters should have the same value and if one or other value is unlikely for the world coordinate.
This part is very similar to the pre-modelling. Here, the user can look at the values given from the second step of the modelling process, and for example change single values (such as block height or width) or set new alignments if an error is found. From this step, it is also possible to go back to the second step of the modelling process, to refine the measurements or even add new parts of the building. This way, the user can create one part of the building at a time, if details are needed. Also, a rougher initial model can be created to see an early sample of the building.
The post-modelling part of the system lets the user set the camera to the transformation used in specified images, which can then be used as background to a wireframe version of the model.
The texturing process from the users point works by finding corners of the surfaces which are wanted as texture images. After specifying the same four corners in a number of images (1-N), the user lets the application work in the following way:
For each texture image pixel (x,y) coordinate:
- Use the specified corners and specified texture size to calculate the homography from image to texture.
- Use the homography to find the pixel colour value (0-255) in image i, and put the value in a histogram together with all corresponding coordinates from the specified images.
- Find the histogram bin with the highest occurrence, and use this to set the textures corresponding pixel colour value.
- If the resulting texture image gives an unsatisfactory result, remove or add more images and go back to step one. The result might be unsatisfactory due to for example partial occlusion, image artefacts (if using too few images), pixelated regions (due to perspective distortion in the original image, or the texture size being too small comparing to how close the camera gets to the finished model) or a too blurry image.