Computer Vision & CG with MDX: 02/01/2006

Tuesday, February 21, 2006

DirectX instead of x3d

For now, I've decided to use Managed DirectX 9.0 for the project, instead of the previous thoughts of using X3D. The decision is mostly based on already using C# for the project, and therefore wanting to use this for as much as possible.

I haven't decided yet if it's better to open a new window with the DirectX content, but of course this is a simple technicality which I can wait with considering.

For anyone interested in learning Managed DirectX, I've only found one interesting book so far. There's a few sample chapters at Developers Fusion which seem interesting. However, there's a new book coming out soon, called Managed DirectX Game Programming, which should be updated with changes made to the SDK since. I've collected a few other resources (websites, articles etc.) regarding but wont publish links unless asked.

Sunday, February 19, 2006

Self made samples

Because we still didn't get the sample videos and 3D models, I created a short video feed of our own, using 3Ds Max to create a landscape with one plane to stand out and be tracked.

I will shortly create a 3D model to use as a house which can be inserted in the video feed. However, first I'll have to consider which format to use by finding out which are common for use with Managed DirectX, which I will most probably use for combining video and 3D models in the end.

The video will be showing the landscape from various angles, meaning the houses should also be shown in different angles. This gives reason to create 3D models in a format where the corners of

the house can be easily differentiated, so that we can say for example which corner of the 3D model is always at the same corresponding point in the video. As soon as I find a few good tutorials for Managed DirectX I'll make a post here with those links. If I find a vast amount of tutorials, I will use ones which are aimed at my experience level. I have experience of OpenGL with C++, but no experience of C# with any type of graphics - or DirectX with anything. However, I don't believe the difference is too extreme, so these tutorials probably won't take that much space in here...

Wednesday, February 15, 2006

Coordinate Arrays and Basic GUI Usability

There are some very simple things one can do to alter images. One of those is to simply do everything manual - clicking on each cordinate manually instead of creating automatic functions for finding suitable coordinates in the images.

In the first version of my code, I simply clicked in the image I had and got the (x,y)-coordinates of the latest clicked location. Of course, having coordinates is helpful, but one position is not enough to actually do anything with the image(s). The code for showing the coordinates in a textbox was simply the following:

private void pictureBox1_MouseClick(object sender, MouseEventArgs e)
{
Coord1_Txt.Text = "X:" + e.Location.X.ToString() + " Y:" + e.Location.Y.ToString();

//The textbox called Coord1_Txt will show something like "X: 432 Y:123"
}

As can be seen, there's no magic, and therefore no need to further comment it. If we instead want to show four coordinates we could for example make it complicated and use four clicks and show each in a new textbox - which I don't find very sexy at all.... Instead I'll create an array, which can be used later when clicking 'OK', to send to the next thing we want to do :-)

I'll simply make an array with eight empty buckets - four used for the x-values and four y-values. Each time I click on the image the two first empty buckets will be filled. At the same time, I'll fill textboxes with the values. The textboxes could also have a radiobuttons beside them, letting the user change the values by selecting and clicking again on a new location. All of this is completely meaningless for my part of the thesis project, since I'll get all the coordinates I need from my project partner when he creates a tracking algorithm. However, to test my own part I need some coordinates to play around with, meaning I would have to wait for his part to finish if I didn't do anything myself....

Hopefully I can update this page tonight or early tomorrow with the results of my new experiments.

Friday, February 10, 2006

Camera Calibration and the basic problems

I am currently trying to find the quickest way of camera calibration available. The reason for speed needs is first and foremost that the application should be able to run in a live video feed. This means that every part of the algorithm which is not optimized can be seen as a bottleneck....

If I had the cameras available, I could use a camera calibration pattern to find the needed intrinsic parameters. Obviously I don't have this possibility.

In the case of having some known measurements in the image, which could be related to how much space these take in the projected image, I could calculate the necessary parameters from the use of those. Of course I have no such measurements in this case - however, I could possibly get them if absolutely necessary, but then I would probably have to handle some politicians and red tape or the like....

One option I am considering is to reconstruct the environment in a 3D model, where I could put the finished models on top and then simply smile and be happy. This could actually be the most interesting option in the end - since it would also give the possibility of pausing the live video feed and let the user "fly down" and take a closer look on the environment at any place of choice. I believe I would enjoy the result of this :-)

Another option is to find the calibration matrix "backwards". Meaning that we already have the three vanishing points, which can be used to find the image of the absolute conic, and in turn the camera calibration matrix. This option will probably not be possible to use for a video sequense shot from a plane, with a minimal set af of parallell lines. However, in an in-door scene, this would probably be my choice of method.

I'll write back here again when finding more algorithms, and after deciding for which to use - bear in mind that I haven't even seen the video sequence yet - or the 3D models we're supposed to insert ;-)

Wednesday, February 08, 2006

Inserting 3D models in a photo

I finally found an interesting article which goes through what needs to be done when placing 3D models on a photo, to make it look like they belong.

Of course, you need to know some camera parameters, which are separated into extrinsic (camera rotation and center) and intrinsic (focal length, skew and the principal point) parameters. The skew of modern cameras is most often 0, or close enough to set it as 0.
Furthermore, projection parameters need to be known when going from 3D space to 2D images. This is due to the fact that we "remove one dimension" from the equation, leaving us with a flat image of objects projected from 3D. The parameters of the camera are found as follows:

Determining the vanishing points - at least two out of the three vanishing points must be detected, whereafter the third can be computed if necessary.
Recovering the projection matrix. This is done using the vanishing points and the image of the origin of the world coordinta system. The origin is selected arbitrarily, and aligned with the principal axis.
Determining the extrinsic camera parameters. This is computed analytically.
World coordinate calculations can be made if we know that the point lies on a known world plane - thereby having a homography between the image point and the corresponding 3D world coordinate.

This is a very short description, which I'll dive into further later, digging into more details of the methods.

Tuesday, February 07, 2006

Virtual Bornholm comments

Obviously, the previous post is the final version of the Thesis Proposal for our project Virtual Bornholm, Visualizing a medieval scandinavian island from the air. For some reason, Firefox crashes when I try to edit the post, so I'll leave it as is instead of putting it in there again...
In the weekend, I was in Hamburg, Germany - I found two really interesting books. The first one I'll use when developing for my thesis application. That book is named Visual Studio Hacks, giving a few tips on how to speed up the use of Visual Studio. After two days I already started using some of what they recommend, which is a very quick adaptation compared to how I usually take new tips...

The other book I got is called Linux Desktop Pocket Guide - which I'll use to install Linux on one of my computers, and finally having some good tips on programs I could use for anything from Office app's to Music and Video players to wysiwyg editors and programming environments for anything you want...

Some comments on the previous post, or specifically, on the thesis proposal:

There will of course be consideration on the GUI of the application. However, the main focus during the first month will be to find suitable algorithms. These algorithms include camera calibration, inserting 3D objects on 2D photographic background, and of course making the 3D objects follow the same real world location in the images of a video sequence....
Geomatric algebra will be covered to some extent, but won't be shown fully in this blog. This because of the complexity of some algebraic proofs which will be covered. Of course any such rule can be broken - depending on if anyone wants such proof to be covered....
The blog will cover some basic C#, basic Windows Forms (applications) and some algebra and other theoretical analysis...

Virtual Bornholm - The medieval visualizer

Project title (English):	Virtual Bornholm
Language (report):	English
Problem formulation:	For many years now, there has been broad historical interest in seeing the environments of the past. Often, this has been difficult to visualize in a video feed, and the designer of such a visualization application would have to model and texture both the existing ground plane and the inserted houses and other objects. Instead of modeling all, we propose to make an application which uses a video feed of the current landscape, and then inserts 3D models in the places where such once existed. Such an algorithm will save time from modeling and texturing difficult landscapes, while at the same time giving more time to create the objects which are supposed to be inserted in the landscape. This thesis project will focus on developing an application which uses video feeds taken from air planes, to visualize how the landscape of the Danish island of Bornholm looked many years ago. Geometrical information will be taken from the video feeds to determine placement, rotation and scale of 3D objects which are inserted on the landscape. The main goals of our thesis project is to create an application which can automatically insert 3D Computer Graphics objects in a live video feed consisting of a landscape seen from an airplane view. This will give the possibility to insert non existing houses in the landscape, thereby for example visualizing how the landscape used to look like 500 years ago. Common use for this type of application could be to let museums give a “flying tour” over a landscape, or to let architects show how their planned buildings could look if implemented in the landscape.
Method:	The methods used will be highly experimental, with partial literature studies. The methods used will be taken from the areas of Image Analysis, Computer Vision and Computer Graphics. We aim to use Computer Vision methods to find the geometry of the landscape in the video feed. Computer Graphics will then be used to insert, scale and rotate the objects, such as houses, in the scene, aiming to make it look like they belong in the finished video. To find the location for object placing in the video images, we will use Image Analysis methods, which will also help in finding realistic scale and rotation of the objects.
What will be handed in:	We will hand in a report describing the work that we have done (analysis, design, implementation, test etc.) and a CD containing report and source code, including instructions on how to compile the code, and run the prototype application that is the outcome of this project.

Thursday, February 02, 2006

Thesis project decided

The final decision for the thesis has been taken. The project will, using a simple and short description, be about inserting 3D objects (houses) in a video feed (taken from a plane flying over the island Bornholm). This will be done using image analysis to find the spots to insert the houses on, tracking methods will determine where the house should be in the next image, and geometrical information will help in determining the rotation and scaling of the houses.

I will start by finding useful geometric information in still images, and finding the camera calibration. This information will then be used to find two vanishing points, drawing up the horizon vanishing line. The third direction needed will either be found through geometrical information or by using the camera calibration matrix. The methods used can be found more described in Computer Vision books (for example 3-D Computer Vision and Multiple View Geometry, which I would recommend).

After creating this "house on landscape insertion algorithm", I will work with my thesis partner, who by then hopefully has created an application to find and track places where the houses should be placed in the video feed.

Computer Vision & CG with MDX

Tuesday, February 21, 2006

DirectX instead of x3d

Sunday, February 19, 2006

Self made samples

Wednesday, February 15, 2006

Coordinate Arrays and Basic GUI Usability

Friday, February 10, 2006

Camera Calibration and the basic problems

Wednesday, February 08, 2006

Inserting 3D models in a photo

Tuesday, February 07, 2006

Virtual Bornholm comments

Virtual Bornholm - The medieval visualizer

Thursday, February 02, 2006

Thesis project decided

Personal Site

Personal Site Blog

Blog Archive

Useful Books

Managed DirectX and C#

Computer Vision

Other Programming

Links