Thursday, August 31, 2006

Texture creation from multiple photos

When creating a 3D model from photos, you of course want some textures for the walls etc. However, when taking photos of the building you wish to model, some parts of the buildings may be occluded buy other objects, such as trees, cars or other buildings. If this is the case from all angles you take the photos from, you'll probably want to get rid of the occluding objects when creating the textures, so you only get the pixels which are actually showing the wall. One option, which I wouldn't recommend, is to take the average colour values showing the wall. Another option could be to detect the occluding object, and not use those pixels when taking pixel values from that particular view. The method used in my project was the following:
  • Get the homographies between different views from point correspondences - where the simplest method uses four clicks in the corners of the wanted texture. Using only four point might give a lower accuracy on the homography, but gives good enough results for an architectural scene, where you don't go too close to the buildings.
  • For each pixel in the wanted texture:
    1. Find corresponding coordinate in in each clicked image (using the homographies).
    2. Put the pixel values for the coordinate in a histogram.
    3. Find the maximum occurence value in the histogram, thereby determining the value of the texture coordinate.

Reconstruction plans

Today I'm really sick, so I decided to stay home and start programming some of the ideas I've gotten from all my studying the last two weeks.

I'll start by creating an interface where people can click on two photos, to use corresponding points for reconstructing a building. In the background, I'll be using some of the stuff I mentioned in the previous post, to find the movements of the camera.

To find the camera position and rotation, I'll be using RANSAC or the 8-point algorithm to get the Fundamental Matrix (F), and then use the Intrinsic camera parameters to get the Essential matrix (E). From this, the rotation and translation of the cameras will be derived.

The reconstruction of 3D points depends on prior scene and camera knowledge. If the camera parameters hadn't been known from calibration, the reconstruction could only be known up to an unknown projective transformation of the environment. The intrinsic parameters gives us the possibility to reconstruct up to an unknown scaling factor, while an unambiguous reconstruction can be done if both intrinsic and extrinsic are known (for example if you have architectural plans available).

In the situation with calibrated cameras, but unknown scene parameters (such as 3D point locations, scale, rotation etc.), we don't know the baseline of the system and can therefore not recover the true scale of the viewed scene. If we know the distance between two points in the scene, we can at least recover the scaling factor to which the reconstruction is unique.

Wednesday, August 30, 2006

3D Reconstruction from Photos

Something I thought would be a simple problem turned out not to be - but mostly because of my own clumsyness. I just noticed a function in OpenCV that I could've used instead of spending time on programming it myself.

I found that through OpenCV, I can use the fundamental matrix - retrieved from at least seven point matches in two photos - and take the SVD (Singular Value Decomposition) on that as a step to find rotation and translation of points in the photos. Since I didn't at first find the SVD in OpenCV, I clumsily assumed that it wasn't in the library. After a day, I accidentaly came across it, but in what seems to be a slower version - cvmSVD() - instead of what I just found, the cvSVD() function :-)

I'll now find the fundamental matrix from point matches, use the internal camera parameters to normalize it into the Essential Matrix, and then use SVD to find the rotation and translation...

We'll see how it works - at least it gave me hope to be done in two weeks, like I had planned to from start :-)