Today I'm really sick, so I decided to stay home and start programming some of the ideas I've gotten from all my studying the last two weeks.
I'll start by creating an interface where people can click on two photos, to use corresponding points for reconstructing a building. In the background, I'll be using some of the stuff I mentioned in the previous post, to find the movements of the camera.
To find the camera position and rotation, I'll be using RANSAC or the 8-point algorithm to get the Fundamental Matrix (F), and then use the Intrinsic camera parameters to get the Essential matrix (E). From this, the rotation and translation of the cameras will be derived.
The reconstruction of 3D points depends on prior scene and camera knowledge. If the camera parameters hadn't been known from calibration, the reconstruction could only be known up to an unknown projective transformation of the environment. The intrinsic parameters gives us the possibility to reconstruct up to an unknown scaling factor, while an unambiguous reconstruction can be done if both intrinsic and extrinsic are known (for example if you have architectural plans available).
In the situation with calibrated cameras, but unknown scene parameters (such as 3D point locations, scale, rotation etc.), we don't know the baseline of the system and can therefore not recover the true scale of the viewed scene. If we know the distance between two points in the scene, we can at least recover the scaling factor to which the reconstruction is unique.