Powered by Squarespace
This form does not yet contain any fields.
    Thursday
    Jun102010

    Great ideas in theoretical computer science

    Hey. This post is a kind of an advertisement for the MIT course 6.080 (thanks to overrider for the link). I strongly recommend everyone who has any interest in computer science to read the lectures. Course covers a lot of awesome topics like Turing machines, computability, reducibility, quantum computers, completeness and incompleteness theorems and many more. Of course, if you are quite an awesome computer scientist, you’ll hardly find anything you haven’t heard before in the course. But I believe that almost anyone can find something interesting there. Now I’d like to share some facts from the lectures with you to interest you more.

    You almost certainly know that some problems are undecidable. Classic example of such a problem is a halting one. But how many problems are undecidable? Only this one? Or, may be, this one and six hundred other? Is the number of undecidable problems even countable? Well, it turns out that it’s not. While there exist countable number of algorithms (one can easily prove it by sorting programs written in some Turing-complete programming language in lexicographical order), set of possible problems has the cardinality of the continuum (proof of this fact is quite simple, it’s almost like the Georg Cantor’s proof that the set of the real numbers is not countable). So, decidable problems are like a drop in the ocean.

    Other interesting fact can be proven. For any possible computational complexity (n^2, n^(n! log n) etc) there exists at least one problem for which algorithm with such complexity is the best one. That problem is “does given algorithm accept given input in no more than K steps” and it can’t be solved faster than in O(K) by actually running the algorithm.

    Problem “does given statement have a proof of length N or less” is in NPC. So, if P=NP, we can choose reasonably big N and check for the existence of the proof (relatively) quickly. If there is no proof of reasonable length, we can conclude that problem is not of any interest. So, in fact, if P=NP, we should stop paying mathematicians for proofs because machines can probably do better. Unfortunately, nowadays almost everybody is quite certain that P is not equal to NP.

    Another interesting topic is derandomization. Does every randomized algorithm have a corresponding deterministic algorithm? Recent success in the proof that PRIMES is in P is closely related to this question.

    If the facts mentioned above make you excited, you should totally read the scribe notes. I’m out.

    Tuesday
    May182010

    An Introduction to MapReduce

    Today I’m going to write about MapReduce. It’s quite popular topic these days, and it’s the one I’m particularly interested in.

    Click to read more ...

    Tuesday
    Apr132010

    My new job at Yandex

    Well, I haven’t post anything here, being quite busy. It was mostly because of the fact I have quit from my previous job at BS Graphics R&D department. And now I’m a software engineer at the Yandex company, largest Russian search engine which holds about 60% of the search market in Russia (Google, btw, has only about 20%). In Yandex I’ll be working with the image search team. At first I’ll be mostly involved in infrastructure related tasks, but in time I’ll probably start doing some interesting stuff closely connected to machine learning and computer vision areas.

    Here are some facts about Yandex, just in case:

    • Yandex has its own computer science and data analysis school which is available for students for free. Really awesome people like Alexey Chervonenkis or Albert Shiryaev read lectures there. And Maxim Babenko’s course on effective algorithms and data structures is the best I’ve ever seen.
    • Yandex hosts Internet Mathematics contest where interesting tasks somehow related to web-search are offered to participants. The goal of the contest this year is to predict the rate of traffic congestion based on previous observations. And last year contest was about learning a function that can predict the relevance of the document with respect to search query (it was just like the contest currently hosted by the Yahoo Labs). 
    • Yandex is one of the two main sponsors of the TopCoder 2010 event (the other one is the U.S. National Security Agency).

     I’ll make a post with photos from the main Yandex office (which looks really great) quite soon. You stay connected :)

    Sunday
    Mar072010

    Misclassification loss and AdaBoost

    I’ve met some people working with AdaBoost who was looking at the misclassification error vs. boosting iteration number plot and wondering: “Why misclassification error rate is increasing on some iterations? Is there something wrong with my AdaBoost implementation? Shouldn’t AdaBoost always decrease the error?” Relax, guys, it’s all right. Instead of optimizing misclassification loss L1(y, y’) = I[y != sign(y’)], AdaBoost optimizes exponential loss L2(y, y’) = exp(-y*y’). Decrease in exponential loss does not always lead to decrease in misclassification loss and vice versa. Nevertheless, small value of the total loss of one type give you some hope that the total loss of another type will be small too.

    Here is plot from the “Elements of Statistical Learning” book comparing these two kinds of loss function on some synthetic data.

    Monday
    Mar012010

    Bayesian model for soft keyboard enhancement

    Virtual keyboard on my iPhone has always appeared as something amazing to me. Its buttons are quite small, my thumbs are rather large but I still make very few typing errors with it. It was quite obvious that iPhone OS developers have integrated some smart algorithm into it. And yesterday I’ve accidentally come across an interesting article about practically the same thing (from Microsoft Research, though). Idea in the paper is very nice and simple, so it practically forces me to post something :)

    Imagine user has entered a sequence of symbols (k1,…,kn)=H (so called typing history) and then he enters new symbol by touching device screen at position l=(x, y). We’ll determine the intended symbol k by maximizing expression P(k | H) P(l | k) with respect to k. Here P(k | H) is the probability of observing symbol k given sequence of previously entered symbols (usually only last 6 symbols are considered). It can be estimated from any text corpus. P(l | k) is the probability of touching screen position l when symbol k is intended. In the paper it is modelled as a bivariate Gaussian distribution. It is also constrained in a special way to prevent total suppressing of very improbable keys (given the typing history).

    Approach like that can dramatically reduce amount of typing errors on soft keyboards. Numbers, charts and comparison with the simple “static” keyboard and state-of-art dynamic keyboards with unconstrained P(l | k) can be found in the paper.

    Tuesday
    Feb092010

    Camera DEcalibration and REcalibration

    In one of the recent posts I have covered the topic of camera calibration a little. But that was not really interesting. Tons of very good books have been written about that. Today we’ll look into keeping your cameras in calibrated state after they’ve been calibrated instead.

    Suppose you have a 3d-tracking system that uses about 20 calibrated cameras installed in some crowded place like bank office or shopping mall. In such places calibrated cameras have a tendency to become uncalibrated sometimes, mostly due to vibrations caused by factors like construction work or cleaning. What should we do about that? Obviously, we should recalibrate cameras whose position or orientation has been changed (or change position and orientation back to original). But how can we find out what cameras should be calibrated again? Should we hire a man who’ll check image on every camera every day? Fortunately, we shouldn’t.

    Camera decalibration can be detected automatically

    What we have to do is to save snapshot from camera at the moment its extrinsic parameters were estimated. Then the following algorithm can be used to check for decalibration:

    1. Extract something like SURF features from the saved snapshot and current camera image.
    2. Match extracted features.
    3. If not less than 10-15% of the matched points have the same pixel coords, camera is OK. Otherwise, it seems to be decalibrated.

    As you can see, this algorithm is fully automatic and can be run, for example, every hour for every camera. It is also quite robust to presence of some temporary objects like humans in both images because it requires only small amount of matches to have the same pixel coordinates. It seems that it works bad (or doesn’t work at all) only in cases when there are very few good features available (for example, when camera looks at the white wall). But in those cases it’s very hard to detect camera decalibration even for human. One can also make this algorithm robust to illumination changes by using some lighting-invariant point descriptors or by image preprocessing.

    OK, we have determined that camera has become decalibrated. And we also have old camera position, orientation and point matches between current and old camera images. It seems that we can automatically determine new camera position easily. Or can’t we?

    Camera can not be recalibrated automatically

    And here is why. Of course, there is a 3d reconstruction algorithm that takes a list of coordinates of matches together with camera intrinsic matrix and returns transform from old camera view space into view space of the new one (together with 3d coordinates of the points). Unfortunately, it can calculate transform and 3d coordinates only up to scale. Neither coordinates of the matched pixels nor intrinsic matrix contain enough information about the scale of you scene. Are you measuring everything in meters? Or in inches? You need more information (like real 3d coordinates of one of the matched points) to rescale algorithm results. But you don’t have them. So the reconstruction algorithm gives you pretty much nothing (well, it gives you correct camera orientation, but you can’t use it without rescaled translation).

    It’s interesting that I’ve failed to find any papers covering the topic of decalibration and recalibration although it’s quite important for some practical computer vision applications. Should I write a short paper about it by myself?

    Sunday
    Jan242010

    The nature of boosting

    Most of the people involved into computer vision field are somehow familiar with boosting. They know that boosting is a powerful ML algorithm and should be used in cases when there are many weak learners available, but only few of them are discriminative enough. They also know that boosting achieves its goals by managing weight distribution on training samples: when new weak classifier is added to the linear combination, samples in the training set are reweighed in a way that samples misclassified by the new classifier will have their weights increased and sample classified correctly - decreased. Those sample weights are used when error of the weak classifier over the training set is calculated. So each new weak classifier is focused mostly on the samples which were misclassified by its colleagues.

    That’s the very good explanation of what AdaBoost algorithm does, but to understand boosting better you should get deeper. There are another interpretations of boosting algorithm available, and some of them are really interesting.

    Boosting as a forward stagewise additive modelling

    Boosting can be viewed as a forward stagewise additive modelling process. Let’s assume we have some loss function L(y,y’) and want to find additive combination of weak classifiers that minimizes it. One way to do it is to use greedy approach. First, let’s select single weak classifier h1(x) that minimizes loss function L(y, h1(x)) over the training set. Then, let’s select weak classifier h2(x) that minimizes L(y, h1(x) + h2(x)). The next classifier, h3(x), should minimize L(y, h1(x) + h2(x) + h3(x)) and so on. This method has a great connection to boosting (explored, for example, in this paper or in this great book): if you choose exponential loss function L(y, y’) = exp(-y*y’), you’ll get exactly the AdaBoost algorithm.

    Boosting as a gradient descent

    At each step of boosting we’re selecting weak classifier that maximally reduces loss over the training set. So, we’re in fact moving in the strong classifier space in the direction of the loss function decrease. Each weak classifier selection here can be viewed as a gradient descent step. Approaches like gradient boosting are entirely based on this observation. In gradient boosting, gradient descent is performed semi-explicitly: strong classifier is represented by a vector of its values on the training set samples. Then, at each boosting step partial derivatives of the loss function (with respect to classifier already trained) are calculated, and regression tree is fitted to the gradient value in a least-squares manner. That regression tree is than selected as a weak classifier.

    Boosting and game theory

    Boosting has several connections with the game theory. Consider matrix Mij with number of rows equal to the number of samples in the training set and number of columns equal to the number of weak classifiers. Mij will be equal to 1 if j-th weak classifier classifies i-th sample correctly (otherwise it will have value of 0). Then it can be shown using von Neumann’s minimax theorem (applied to the game in mixed strategies with payoff matrix M) that if for every possible weight distribution there exists weak learner with error less than 1/2-eps, than there exists a convex combination of weak learners with a margin of at least 2*eps for every training sample. It means that AdaBoost has at least a potential for success. One can go further and discover that boosting can be viewed as a special case of an algorithm for approximately solving matrix games. Both topics are covered in this paper very well.

    Connection to game theory also leads us to idea of using linear or convex programming techniques in boosting. Some approaches are based on it, but I’m not familiar with them very well.

    Wednesday
    Jan202010

    Localization and mapping

    Hey, have you ever seen George Klein’s SLAM engine in action? I can’t even call it awesome. It’s better. And it’s not just the algorithm itself. Just look into possibilities it opens.

    Friday
    Dec252009

    An interesting approach to camera calibration

    Camera calibration is a process of determining intrinsic (like principal point or focal length) and extrinsic (position and orientation in space) parameters of a camera, which is often described by the pinhole camera model. In computer vision we usually perform calibration by analyzing images taken from camera. The most widely used approach to camera calibration is based on this paper by Zhengyou Zhang. It involves chessboard (or some other planar calibration pattern) and consists of the following steps:

    1. Find a chessboard (bigger is better). Note that it should have distinct width and height (measured in chessboard squares) which should both be even (otherwise you will be unable to determine chessboard orientation given its picture).
    2. Find a “calibration dude”.
    3. Calibration dude takes a chessboard and waves it in front of the camera attached to a computer with calibration software installed. Calibration software takes about 20 images with distinct chessboard orientations (camera orientation remains the same, of course), finds inner corners of the chessboard on every image and then uses them together with information about real-world size of chessboard to determine intrinsic camera parameters.
    4. Calibration dude puts chessboard on the floor in a way camera still can see it. Calibration software than takes one more image from camera, finds chessboard corners on it (again) and calculates camera position assuming that some predefined chessboard corner is located at the coordinate system origin and chessboard sides are oriented towards coordinate system axes. Of course, any other chessboard orientation can be specified in software, but this one is the most simple.

    What problems do we have there? First of all, camera can see no floor at all, so we can’t just put chessboard on it during step 4. Instead we need to set it up somewhere else, not on the ground level. We should then carefully measure its position and orientation and pass them as an input to the calibration tool.

    What if we have more than one camera seeing no floor, and those cameras are not overlapping? In this case we should repeat process described above for each camera, carefully measuring chessboard position in the world coordinate system every time. In fact, it’s a pain in the ass. Calibrating multiple cameras that way can be really slow and error-prone.

    Much more interesting approach to multiple non-overlapping camera calibration was proposed in this paper. Its key idea is to fix chessboard position (put it at the origin) and move mirror instead. Cameras will see chessboard reflection in that mirror and use reflected image for calibration. Of course, some questions arise.

    1. Is it legal to determine intrinsic camera parameters using reflected chessboard image? Answer is simple: yes. Authors prove that common calibration techniques give same result (except of coordinate system handedness) when applied to mirrored images.
    2. Don’t we need to know position and orientation of the mirror when calibrating extrinsic parameters? No, we don’t. It turns out that every mirrored chessboard image imposes constraint on the position and orientation of the real camera. And if we have five (or more) such images, we can reconstruct position and orientation without any knowledge of mirror position.

    This approach can save a lot of time and help to reduce part of the calibration error that arises from incorrect chessboard position and orientation determination. But it has it’s own drawbacks, of course. First of all, mirror is rather heavy. It’s not easy to manipulate it if your calibration dude is not a beefcake. Next, it’s hard to change orientation of the calibration pattern in frame from one snapshot to another when using mirror. It has to be in the field of view of the camera, and oriented such that the pattern’s image is reflected into the camera. These requirements may result in little variation in the pattern orientations as seen by the camera in the mirror and lead to solution degeneration.

    Despite the drawbacks, this approach has the potential to increase speed of the multiple camera calibration process a lot. We will probably try it in 2010.

    Ram Krishan Kumar, Adrian Ilie, Jan-Michael Frahm, & Marc Pollefeys (2008). Simple calibration of non-overlapping cameras with a mirror 2008 IEEE Conference on Computer Vision and Pattern Recognition
    Wednesday
    Dec232009

    Some strange art detected

    Few days ago at the office building where our company’s office is located was an exhibition of some guy who draws labyrinths and then sticks pieces of old hardware (like old CPUs or floppy disks) to his paintings. I think his creative work is quite weird, but maybe you’ll like it, who knows :)

    Click to read more ...