<?xml version="1.0" encoding="UTF-8"?>
<!--Generated by Squarespace Site Server v5.11.5 (http://www.squarespace.com/) on Sat, 04 Sep 2010 13:21:31 GMT--><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><title>Sex, drugs and applied science</title><link>http://sexdrugsandappliedscience.com/blog/</link><description>Machine learning, computer vision and other stuff that rocks.</description><lastBuildDate>Fri, 09 Jul 2010 12:59:47 +0000</lastBuildDate><copyright></copyright><language>en-US</language><generator>Squarespace Site Server v5.11.5 (http://www.squarespace.com/)</generator><item><title>Great ideas in theoretical computer science</title><category>courses</category><category>lectures</category><category>math</category><dc:creator>hr0nix</dc:creator><pubDate>Thu, 10 Jun 2010 16:50:57 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2010/6/10/great-ideas-in-theoretical-computer-science.html</link><guid isPermaLink="false">376081:4052606:7942659</guid><description><![CDATA[<p>Hey. This post is a kind of an advertisement for the MIT course <a href="http://stellar.mit.edu/S/course/6/sp08/6.080/index.html">6.080</a>&nbsp;(thanks to&nbsp;<a href="http://computerblindness.blogspot.com">overrider</a> for the link). I strongly recommend everyone who has any interest in computer science to read the lectures. Course covers a lot of awesome topics like Turing machines, computability, reducibility, quantum computers, completeness and incompleteness theorems and many more. Of course, if you are quite an awesome computer scientist, you&#8217;ll hardly find anything you haven&#8217;t heard before in the course. But I believe that almost anyone can find something interesting there. Now&nbsp;I&#8217;d like to share some facts from the lectures with you to interest you more.</p>
<p>You almost certainly know that some problems are <a href="http://en.wikipedia.org/wiki/Undecidable_problem">undecidable</a>. Classic example of such a problem is a <a href="http://en.wikipedia.org/wiki/Halting_problem">halting one</a>. But how many problems are undecidable? Only this one? Or, may be, this one and six hundred other? Is the number of undecidable problems even countable? Well, it turns out that it&#8217;s not. While there exist countable number of algorithms (one can easily prove it by sorting programs written in some Turing-complete programming language in lexicographical order), set of possible problems has the cardinality of the continuum (proof of this fact is quite simple, it&#8217;s almost like the Georg Cantor&#8217;s <a href="http://en.wikipedia.org/wiki/Cardinality_of_the_continuum">proof</a> that the set of the real numbers is not countable). So, decidable problems are like a drop in the ocean.</p>
<p>Other interesting fact can be proven. For any possible computational complexity (n^2, n^(n! log n) etc) there exists at least one problem for which algorithm with such complexity is the best one. That problem is &#8220;does given algorithm accept given input in no more than K steps&#8221; and it can&#8217;t be solved faster than in O(K) by actually running the algorithm.</p>
<p>Problem &#8220;does given statement have a proof of length N or less&#8221; is in NPC. So, if P=NP, we can choose reasonably big N and check for the existence of the proof (relatively) quickly. If there is no proof of reasonable length, we can conclude that problem is not of any interest. So, in fact, if P=NP, we should stop paying mathematicians for proofs because machines can probably do better. Unfortunately, nowadays almost everybody is quite certain that P is not equal to NP.</p>
<p>Another interesting topic is derandomization. Does every randomized algorithm have a corresponding deterministic algorithm? Recent success in the proof that <a href="http://en.wikipedia.org/wiki/AKS_primality_test">PRIMES is in P</a> is closely related to this question.</p>
<p>If the facts mentioned above make you excited, you should totally read the scribe notes. I&#8217;m out.</p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-7942659.xml</wfw:commentRss></item><item><title>An Introduction to MapReduce</title><category>books</category><category>mapreduce</category><category>programming</category><dc:creator>hr0nix</dc:creator><pubDate>Tue, 18 May 2010 15:11:27 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2010/5/18/an-introduction-to-mapreduce.html</link><guid isPermaLink="false">376081:4052606:7711784</guid><description><![CDATA[<div id="_mcePaste">Today I&#8217;m going to write about <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a>. It&#8217;s quite popular topic these days, and it&#8217;s the one I&#8217;m particularly interested in.</div>
<div id="_mcePaste"></div>
<div id="_mcePaste">MapReduce is a programming model for performing distributed computations on huge amounts of data, originally invented by Google. Currently it&#8217;s used by almost everyone interested in such activities. Variety of its users includes Facebook, EHarmony, Yahoo! (their MapReduce implementation, <a href="http://en.wikipedia.org/wiki/Hadoop">Hadoop</a>, is currently developed and supported by Apache Software Foundation), IBM, Twitter, AOL and many more. As you can see, almost all the biggest internet companies have to mine data, and MR is one good way to do it.</div>
<div id="_mcePaste"></div>
<div id="_mcePaste">Key ideas behind the MapReduce are <em>really</em> simple compared to almost any other distributed processing paradigm. All the data is stored as a sequence of key/value pairs in the distributed file system. MapReduce programs process those tables with only 2 types of operations (inspired by functional programming), map and reduce. Every map operation takes one key/value pair as its input and can emit zero or more key/value pairs into the output table. Reduce operation takes key and list of all values associated with that key and can also emit zero or more output pairs. All that stuff works in a cluster where each node stores a portion of data which can be processed by it or send to another node. Data management is fully transparent to the MapReduce user: the execution environment takes all the care.</div>
<div id="_mcePaste"></div>
<div id="_mcePaste">Canonical MapReduce application example is a program that counts word occurrence in a list of documents. Let&#8217;s assume we have a list of [docid, text]&nbsp;pairs stored on a MapReduce cluster. We can the define map and reduce operations as follow:</div>
<div></div>
<div></div>
<div></div>
<div></div>
<div id="_mcePaste" style="padding-left: 30px;">Map (docid, text) : <strong>foreach</strong> word <strong>in</strong> text <strong>emit</strong> [word, count(word, text)]</div>
<div id="_mcePaste" style="padding-left: 30px;"></div>
<div style="padding-left: 30px;">Reduce (word, counts) : <strong>emit</strong> [word, sum(counts)]</div>
<div><span class="full-image-block ssNonEditable"><span>&nbsp;Map operation takes every word in the document and emits it paired with the number of its occurrences in text. Reduce operation then just sums up all the counts for each word.</span></span></div>
<div></div>
<p style="text-align: center;"><img style="width: 450px;" src="http://sexdrugsandappliedscience.com/storage/post-images/MapReduce.png?__SQUARESPACE_CACHEVERSION=1274785136686" alt="" /></p>
<div id="_mcePaste"></div>
<div id="_mcePaste">Despite its simplicity, many quite interesting algorithms can be implemented on MapReduce. A few examples are inverted index creation, <a href="http://en.wikipedia.org/wiki/PageRank">PageRank</a> calculation, graph traversal and expectation-maximization for HMM. For those who want to learn more, I recommend <em>Data-Intensive Text Processing with MapReduce</em> book by Jimmy Lin and Chris Dyer available <a href="http://www.umiacs.umd.edu/~jimmylin/book.html">here</a>&nbsp;(I&#8217;ve taken an illustration for this post from it). Many nice lectures covering all the aspects of MapReduce are also available at the <a href="http://code.google.com/intl/ru/edu/parallel/index.html">Google Code University</a>.</div>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-7711784.xml</wfw:commentRss></item><item><title>My new job at Yandex</title><category>events</category><category>yandex</category><dc:creator>hr0nix</dc:creator><pubDate>Tue, 13 Apr 2010 13:31:36 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2010/4/13/my-new-job-at-yandex.html</link><guid isPermaLink="false">376081:4052606:7311532</guid><description><![CDATA[<p>Well, I haven&#8217;t post anything here, being quite busy. It was mostly because of the fact I have quit from my previous job at BS Graphics R&amp;D department. And now I&#8217;m a software engineer at the <a href="http://company.yandex.com/">Yandex</a>&nbsp;company, largest Russian search engine which holds about 60% of the search market in Russia (Google, btw, has only about 20%). In Yandex I&#8217;ll be working with the <a href="http://images.yandex.ru/">image search</a> team. At first I&#8217;ll be mostly involved in infrastructure related tasks, but in time I&#8217;ll probably start doing some interesting stuff closely connected to machine learning and computer vision areas.</p>
<p>Here are some facts about Yandex, just in case:</p>
<ul>
<li>Yandex has its own <a href="http://company.yandex.com/press_center/press_releases/2007/2007-07-25.xml">computer science and data analysis school</a> which is available for students for free. Really awesome people like <a href="http://en.wikipedia.org/wiki/Alexey_Chervonenkis">Alexey Chervonenkis</a> or <a href="http://en.wikipedia.org/wiki/Albert_Shiryaev">Albert Shiryaev</a> read lectures there. And <a href="http://adde.math.msu.su/max/">Maxim Babenko</a>&#8217;s course on effective algorithms and data structures is the best I&#8217;ve ever seen.</li>
<li>Yandex hosts <a href="http://imat2010.yandex.ru/en">Internet Mathematics</a> contest where interesting tasks somehow related to web-search are offered to participants. The goal of the <a href="http://imat2010.yandex.ru/en">contest</a> this year is to predict the rate of traffic congestion based on previous observations. And last year <a href="http://imat2009.yandex.ru/en">contest</a> was about learning a function that can predict the relevance of the document with respect to search query (it was just like the <a href="http://learningtorankchallenge.yahoo.com/">contest</a> currently hosted by the Yahoo Labs).&nbsp;</li>
<li>Yandex is one of the two main sponsors of the <a href="http://www.topcoder.com/tc?module=Static&amp;d1=tournaments&amp;d2=tco10&amp;d3=overview&amp;d4=sponsor1">TopCoder 2010</a> event (the other one is the U.S. National Security Agency).</li>
</ul>
<p>&nbsp;I&#8217;ll make a post with photos from the main Yandex office (which looks really great) quite soon. You stay connected :)</p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-7311532.xml</wfw:commentRss></item><item><title>Misclassification loss and AdaBoost</title><category>boosting</category><category>classification</category><dc:creator>hr0nix</dc:creator><pubDate>Sun, 07 Mar 2010 16:50:46 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2010/3/7/misclassification-loss-and-adaboost.html</link><guid isPermaLink="false">376081:4052606:6935499</guid><description><![CDATA[<p>I&#8217;ve met some people working with AdaBoost who was looking at the misclassification error vs. boosting iteration number plot and wondering: &#8220;Why misclassification error rate is increasing on some iterations? Is there something wrong with my AdaBoost implementation? Shouldn&#8217;t AdaBoost always decrease the error?&#8221; Relax, guys, it&#8217;s all right. Instead of optimizing misclassification loss L1(y, y&#8217;) = I[y != sign(y&#8217;)], AdaBoost optimizes exponential loss L2(y, y&#8217;) = exp(-y*y&#8217;). Decrease in exponential loss does not always lead to decrease in misclassification loss and vice versa. Nevertheless, small value of the total loss of one type give you some hope that the total loss of another type will be small too.</p>
<p>Here is plot from the &#8220;Elements of Statistical Learning&#8221; book comparing these two kinds of loss function on some synthetic data.</p>
<p><span class="full-image-block ssNonEditable"><span><img style="width: 500px; float: left;" src="http://sexdrugsandappliedscience.com/storage/post-images/ExpLossPlot.png?__SQUARESPACE_CACHEVERSION=1267982058842" alt="" /></span></span></p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-6935499.xml</wfw:commentRss></item><item><title>Bayesian model for soft keyboard enhancement</title><category>bayesian</category><category>papers</category><dc:creator>hr0nix</dc:creator><pubDate>Mon, 01 Mar 2010 13:48:38 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2010/3/1/bayesian-model-for-soft-keyboard-enhancement.html</link><guid isPermaLink="false">376081:4052606:6874556</guid><description><![CDATA[<p>Virtual keyboard on my iPhone has always appeared as something amazing to me. Its buttons are quite small, my thumbs are rather large but I still make very few typing errors with it. It was quite obvious that iPhone OS developers have integrated some smart algorithm into it. And yesterday I&#8217;ve accidentally come across an interesting <a href="http://portal.acm.org/citation.cfm?id=1719986&amp;coll=GUIDE&amp;dl=GUIDE&amp;CFID=79930538&amp;CFTOKEN=35030389&amp;ret=1#Fulltext">article</a> about practically the same thing (from Microsoft Research, though). Idea in the paper is very nice and simple, so it practically forces me to post something :)</p>
<p>Imagine user has entered a sequence of symbols (<strong>k1</strong>,&#8230;,<strong>kn</strong>)=<strong>H</strong> (so called typing history) and then he enters new symbol by touching device screen at position <strong>l</strong>=(<strong>x</strong>, <strong>y</strong>). We&#8217;ll determine the intended symbol <strong>k</strong> by maximizing expression P(<strong>k</strong> | <strong>H</strong>) P(<strong>l</strong> | <strong>k</strong>) with respect to <strong>k</strong>. Here P(<strong>k</strong> | <strong>H</strong>) is the probability of observing symbol <strong>k</strong> given sequence of previously entered symbols (usually only last 6 symbols are considered). It can be estimated from any text corpus. P(<strong>l</strong> | <strong>k</strong>) is the probability of touching screen position <strong>l</strong> when symbol <strong>k</strong> is intended. In the paper it is modelled as a bivariate Gaussian distribution. It is also constrained in a special way to prevent total suppressing of very improbable keys (given the typing history).</p>
<p>Approach like that can dramatically reduce amount of typing errors on soft keyboards. Numbers, charts and comparison with the simple &#8220;static&#8221; keyboard and state-of-art dynamic keyboards with unconstrained P(<strong>l</strong> | <strong>k</strong>) can be found in the paper.</p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-6874556.xml</wfw:commentRss></item><item><title>Camera DEcalibration and REcalibration</title><category>ideas</category><category>thoughts</category><category>vision</category><dc:creator>hr0nix</dc:creator><pubDate>Tue, 09 Feb 2010 09:18:18 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2010/2/9/camera-decalibration-and-recalibration.html</link><guid isPermaLink="false">376081:4052606:6621110</guid><description><![CDATA[<p>In <a href="http://sexdrugsandappliedscience.com/blog/2009/12/25/an-interesting-approach-to-camera-calibration.html">one</a> of the recent posts I have covered the topic of camera calibration a little. But that was not really interesting. Tons of <a href="http://www.amazon.com/Multiple-View-Geometry-Computer-Vision/dp/0521623049">very</a> <a href="http://www.amazon.com/Geometry-Multiple-Images-Formation-Applications/dp/0262062208">good</a> books have been written about that. Today we&#8217;ll look into keeping your cameras in calibrated state after they&#8217;ve been calibrated instead.</p>
<p>Suppose you have a 3d-tracking system that uses about 20 calibrated cameras installed in some crowded place like bank office or shopping mall. In such places calibrated cameras have a tendency to become uncalibrated sometimes, mostly due to vibrations caused by factors like construction work or cleaning. What should we do about that? Obviously, we should recalibrate cameras whose position or orientation has been changed (or change position and orientation back to original). But how can we find out what cameras should be calibrated again? Should we hire a man who&#8217;ll check image on every camera every day? Fortunately, we shouldn&#8217;t.</p>
<h3>Camera decalibration can be detected automatically</h3>
<p>What we have to do is to save snapshot from camera at the moment its extrinsic parameters were estimated. Then the following algorithm can be used to check for decalibration:</p>
<ol>
<li>Extract something like SURF features from the saved snapshot and current camera image.</li>
<li>Match extracted features.</li>
<li>If not less than 10-15% of the matched points have the same pixel coords, camera is OK. Otherwise, it seems to be decalibrated.</li>
</ol>
<p>As you can see, this algorithm is fully automatic and can be run, for example, every hour for every camera. It is also quite robust to presence of some temporary objects like humans in both images because it requires only small amount of matches to have the same pixel coordinates. It seems that it works bad (or doesn&#8217;t work at all) only in cases when there are very few good features available (for example, when camera looks at the white wall). But in those cases it&#8217;s very hard to detect camera decalibration even for human. One can also make this algorithm robust to illumination changes by using some lighting-invariant point descriptors or by image preprocessing.</p>
<p>OK, we have determined that camera has become decalibrated. And we also have old camera position, orientation and point matches between current and old camera images. It seems that we can automatically determine new camera position easily. Or can&#8217;t we?</p>
<h3>Camera can not be recalibrated automatically</h3>
<p>And here is why. Of course, there is a 3d reconstruction algorithm that takes a list of coordinates of matches together with camera intrinsic matrix and returns transform from old camera view space into view space of the new one (together with 3d coordinates of the points). Unfortunately, it can calculate transform and 3d coordinates only <strong>up to scale</strong>. Neither coordinates of the matched pixels nor intrinsic matrix contain enough information about the scale of you scene. Are you measuring everything in meters? Or in inches? You need more information (like real 3d coordinates of one of the matched points) to rescale algorithm results. But you don&#8217;t have them. So the reconstruction algorithm gives you pretty much nothing (well, it gives you correct camera orientation, but you can&#8217;t use it without rescaled translation).</p>
<p>It&#8217;s interesting that I&#8217;ve failed to find any papers covering the topic of decalibration and recalibration although it&#8217;s quite important for some practical computer vision applications. Should I write a short paper about it by myself?</p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-6621110.xml</wfw:commentRss></item><item><title>The nature of boosting</title><category>boosting</category><category>classification</category><category>regression</category><dc:creator>hr0nix</dc:creator><pubDate>Sun, 24 Jan 2010 18:07:45 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2010/1/24/the-nature-of-boosting.html</link><guid isPermaLink="false">376081:4052606:6418063</guid><description><![CDATA[<p>Most of the people involved into computer vision field are somehow familiar with&nbsp;<a href="http://en.wikipedia.org/wiki/Boosting">boosting</a>. They know that boosting is a powerful ML algorithm and should be used in cases when there are many weak learners available, but only few of them are discriminative enough. They also know that boosting achieves its goals by managing weight distribution on training samples: when new weak classifier is added to the linear combination, samples in the training set are reweighed in a way that samples misclassified by the new classifier will have their weights increased and sample classified correctly - decreased. Those sample weights are used when error of the weak classifier over the training set is calculated. So each new weak classifier is focused mostly on the samples which were misclassified by its colleagues.</p>
<p>That&#8217;s the very good explanation of what <a href="http://en.wikipedia.org/wiki/AdaBoost">AdaBoost</a> algorithm does, but to understand boosting better you should get deeper. There are another interpretations of boosting algorithm available, and some of them are really interesting.</p>
<h3>Boosting as a forward stagewise additive modelling</h3>
<p>Boosting can be viewed as a forward stagewise additive modelling process. Let&#8217;s assume we have some loss function L(y,y&#8217;) and want to find additive combination of weak classifiers that minimizes it. One way to do it is to use greedy approach. First, let&#8217;s select single weak classifier h1(x) that minimizes loss function L(y, h1(x)) over the training set. Then, let&#8217;s select weak classifier h2(x) that minimizes L(y, h1(x) + h2(x)). The next classifier, h3(x), should minimize L(y, h1(x) + h2(x) + h3(x)) and so on. This method has a great connection to boosting (explored, for example, in <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.42.9050&amp;rep=rep1&amp;type=pdf">this</a> paper or in <a href="http://www-stat.stanford.edu/~tibs/ElemStatLearn/">this great book</a>): if you choose exponential loss function L(y, y&#8217;) = exp(-y*y&#8217;), you&#8217;ll get exactly the AdaBoost algorithm.</p>
<h3>Boosting as a gradient descent</h3>
<p>At each step of boosting we&#8217;re selecting weak classifier that maximally reduces loss over the training set. So, we&#8217;re in fact moving in the strong classifier space in the direction of the loss function decrease. Each weak classifier selection here can be viewed as a gradient descent step. Approaches like gradient boosting are entirely based on this observation. In <a href="http://www.salfordsystems.com/doc/StochasticBoostingSS.pdf">gradient boosting</a>, gradient descent is performed semi-explicitly: strong classifier is represented by a vector of its values on the training set samples. Then, at each boosting step partial derivatives of the loss function (with respect to classifier already trained) are calculated, and regression tree is fitted to the gradient value in a least-squares manner. That regression tree is than selected as a weak classifier.</p>
<h3>Boosting and game theory</h3>
<p>Boosting has several connections with the game theory. Consider matrix Mij with number of rows equal to the number of samples in the training set and number of columns equal to the number of weak classifiers. Mij will be equal to 1 if j-th weak classifier classifies i-th sample correctly (otherwise it will have value of 0). Then it can be shown using <a href="http://en.wikipedia.org/wiki/Minimax">von Neumann&#8217;s minimax theorem</a>&nbsp;(applied to the game in mixed strategies with payoff matrix M) that if for every possible weight distribution there exists weak learner with error less than 1/2-eps, than there exists a convex combination of weak learners with a <a href="http://en.wikipedia.org/wiki/Margin_(machine_learning)">margin</a> of at least 2*eps for every training sample. It means that AdaBoost has at least a potential for success. One can go further and discover that boosting can be viewed as a special case of an algorithm for approximately solving matrix games. Both topics are covered in <a href="http://www.stat.duke.edu/courses/Spring04/sta226/freund96game.pdf">this paper</a> very well.</p>
<p>Connection to game theory also leads us to idea of using linear or convex programming techniques in boosting. Some approaches are based on it, but I&#8217;m not familiar with them very well.</p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-6418063.xml</wfw:commentRss></item><item><title>Localization and mapping</title><category>links</category><category>vision</category><dc:creator>hr0nix</dc:creator><pubDate>Wed, 20 Jan 2010 13:01:03 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2010/1/20/localization-and-mapping.html</link><guid isPermaLink="false">376081:4052606:6375743</guid><description><![CDATA[<p>Hey, have you ever seen <a href="http://www.robots.ox.ac.uk/~gk/">George Klein</a>&#8217;s <a href="http://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping">SLAM</a> engine in action? I can&#8217;t even call it awesome. It&#8217;s better. And it&#8217;s not just the algorithm itself. Just look into possibilities it opens.</p>
<p style="text-align: center;"><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/Y9HMn6bd-v8&hl=ru_RU&fs=1&"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/Y9HMn6bd-v8&hl=ru_RU&fs=1&" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
<p style="text-align: center;"><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/pBI5HwitBX4&hl=ru_RU&fs=1&"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/pBI5HwitBX4&hl=ru_RU&fs=1&" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-6375743.xml</wfw:commentRss></item><item><title>An interesting approach to camera calibration</title><category>papers</category><category>vision</category><dc:creator>hr0nix</dc:creator><pubDate>Fri, 25 Dec 2009 15:00:09 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2009/12/25/an-interesting-approach-to-camera-calibration.html</link><guid isPermaLink="false">376081:4052606:6140895</guid><description><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Camera_resectioning">Camera calibration</a> is a process of determining intrinsic (like principal point or focal length) and extrinsic (position and orientation in space) parameters of a camera, which is often described by the <a href="http://en.wikipedia.org/wiki/Pinhole_camera_model">pinhole camera model</a>. In computer vision we usually perform calibration by analyzing images taken from camera. The most widely used approach to camera calibration is based on <a href="http://research.microsoft.com/en-us/um/people/zhang/Papers/TR98-71.pdf">this paper</a> by Zhengyou Zhang. It involves chessboard (or some other planar calibration pattern) and consists of the following steps:</p>
<ol>
<li>Find a chessboard (bigger is better). Note that it should have distinct width and height (measured in chessboard squares) which should both be even (otherwise you will be unable to determine chessboard orientation given its picture).</li>
<li>Find a &#8220;calibration dude&#8221;.</li>
<li>Calibration dude takes a chessboard and waves it in front of the camera attached to a computer with calibration software installed. Calibration software takes about 20 images with distinct chessboard orientations (camera orientation remains the same, of course), finds inner corners of the chessboard on every image and then uses them together with information about real-world size of chessboard to determine intrinsic camera parameters.</li>
<li>Calibration dude puts chessboard on the floor in a way camera still&nbsp;can see it. Calibration software than takes one more image from camera, finds chessboard corners on it (again) and calculates camera position assuming that some predefined chessboard corner is located at the coordinate system origin and chessboard sides are oriented towards coordinate system axes. Of course, any other chessboard orientation can be specified in&nbsp;software, but this one is the most simple.</li>
</ol>
<p>What problems do we have there? First of all, camera can see no floor at all, so we can&#8217;t just put chessboard on it during step 4. Instead we need to set it up somewhere else, not on the ground level. We should then carefully measure its position and orientation and pass them as an input to the calibration tool.</p>
<p>What if we have more than one camera seeing no floor, and those cameras are not overlapping? In this case we should repeat process described above for each camera, carefully measuring chessboard position in the world coordinate system every time. In fact, it&#8217;s a pain in the ass. Calibrating multiple cameras that way can be really slow and error-prone.</p>
<ol> </ol>
<p>Much more interesting approach to multiple non-overlapping camera calibration was proposed in this <a href="http://www.cs.unc.edu/~ramkris/MirrorCameraCalib.html">paper</a>. Its key idea is to fix chessboard position (put it at the origin) and move mirror instead. Cameras will see chessboard reflection in that mirror and use reflected image for calibration. Of course, some questions arise.</p>
<ol>
<li>Is it legal to determine intrinsic camera parameters using reflected chessboard image? Answer is simple: yes. Authors prove that common calibration techniques give same result (except of coordinate system handedness) when applied to mirrored images.</li>
<li>Don&#8217;t we need to know position and orientation of the mirror when calibrating extrinsic parameters? No, we don&#8217;t. It turns out that every mirrored chessboard image imposes constraint on the position and orientation of the real camera. And if we have five (or more) such images, we can reconstruct position and orientation without any knowledge of mirror position.</li>
</ol>
<p>This approach can save a lot of time and help to reduce part of the calibration error that arises from incorrect chessboard position and orientation determination. But it has it&#8217;s own drawbacks, of course. First of all, mirror is rather heavy. It&#8217;s not easy to manipulate it if your calibration dude is not a beefcake. Next, it&#8217;s hard to change orientation of the calibration pattern in frame from one snapshot to another when using mirror. It has to be in the field of view of the camera, and oriented such that the pattern&rsquo;s image is reflected into the camera. These requirements may result in little variation in the pattern orientations as seen by the camera in the mirror and lead to solution degeneration.</p>
<p>Despite the drawbacks, this approach has the potential to increase speed of the multiple camera calibration process a lot. We will probably try it in 2010.</p>

<div style="border:1px solid gray;">
<span class="Z3988" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.jtitle=2008+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition&rft_id=info%3A%2F&rfr_id=info%3Asid%2Fresearchblogging.org&rft.atitle=Simple+calibration+of+non-overlapping+cameras+with+a+mirror&rft.issn=&rft.date=2008&rft.volume=&rft.issue=&rft.spage=&rft.epage=&rft.artnum=&rft.au=Ram+Krishan+Kumar&rft.au=Adrian+Ilie&rft.au=Jan-Michael+Frahm&rft.au=Marc+Pollefeys&rfe_dat=bpr3.included=1;bpr3.tags=Computer+Science%2CArtificial+Intelligence">Ram Krishan Kumar, Adrian Ilie, Jan-Michael Frahm, & Marc Pollefeys (2008). Simple calibration of non-overlapping cameras with a mirror <span style="font-style: italic;">2008 IEEE Conference on Computer Vision and Pattern Recognition</span></span>
</div>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-6140895.xml</wfw:commentRss></item><item><title>Some strange art detected</title><category>art</category><category>fun</category><dc:creator>hr0nix</dc:creator><pubDate>Wed, 23 Dec 2009 17:55:10 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2009/12/23/some-strange-art-detected.html</link><guid isPermaLink="false">376081:4052606:6130190</guid><description><![CDATA[<p>Few days ago at the office building where our company&#8217;s office is located was an exhibition of some guy who draws labyrinths and then sticks pieces of old hardware (like old CPUs or floppy disks) to his paintings. I think his creative work is quite weird, but maybe you&#8217;ll like it, who knows :)</p>
<p style="text-align: center;"><span class="full-image-block ssNonEditable"><span><img style="width: 400px;" src="http://sexdrugsandappliedscience.com/storage/post-images/labyrinth1.JPG?__SQUARESPACE_CACHEVERSION=1261591291574" alt="" /></span></span></p>
<p style="text-align: center;"><span class="full-image-block ssNonEditable"><span><img style="width: 400px;" src="http://sexdrugsandappliedscience.com/storage/post-images/labyrinth2.JPG?__SQUARESPACE_CACHEVERSION=1261591329601" alt="" /></span></span></p>
<p style="text-align: center;"><span class="full-image-block ssNonEditable"><span><img style="width: 400px;" src="http://sexdrugsandappliedscience.com/storage/post-images/labyrinth3.JPG?__SQUARESPACE_CACHEVERSION=1261591484304" alt="" /></span></span></p>
<p style="text-align: center;"><span class="full-image-block ssNonEditable"><span><img style="width: 400px;" src="http://sexdrugsandappliedscience.com/storage/post-images/labyrinth4.JPG?__SQUARESPACE_CACHEVERSION=1261591522105" alt="" /></span></span></p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-6130190.xml</wfw:commentRss></item><item><title>Bayesian approach: don't hurry when reasoning</title><category>bayesian</category><category>fun</category><category>math</category><dc:creator>hr0nix</dc:creator><pubDate>Mon, 07 Dec 2009 11:52:54 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2009/12/7/bayesian-approach-dont-hurry-when-reasoning.html</link><guid isPermaLink="false">376081:4052606:6007596</guid><description><![CDATA[<p>Few days ago one friend of mine (let&#8217;s name him A) has mentioned some psychological test. Test consists of a single question, and, according to some statistics, 98 percent of serial killers answer that question right. When another friend of mine (B) gave correct answer to the question, A said that it&#8217;s highly probable that B is a serial killer. Was he right? Of course, he wasn&#8217;t. A&#8217;s problem is that he is not familiar with the Bayesian approach at all. And here is why.</p>
<p>Let R be the event of giving the right answer to the question and M be the event that guy who answers is a maniac. From gathered statistics we know that P(R | M) = 0.98. Next, from Bayes theorem whe know that P(M | R) = P(R | M)P(M)/P(R). P(R) can be represented as P(R | M)P(M) + P(R | not M)P(not M). Next, assume that about 5 percent of usual people also gave the right answer to the question, so P(R | not M) = 0.05. Then, what&#8217;s the prior probability of M? I think we all agree that it&#8217;s quite small, about 1e-5 or even less. Now we are ready to calculate posterior probability of M:</p>
<p>P(M | R) = (0.98 * 1e-5) / (0.98 * 1e-5 + 0.05 * (1 - 1e-5)) = 0.0000098 / (0.0000098 + 0.0499995) ~ 0.0002.</p>
<p>Probability is very small, but why is that? That&#8217;s because my friend A was talking about the likelihood, but he didn&#8217;t take prior probabilities into account. And in this case prior probabilities are of great importance.</p>
<p>Btw, what&#8217;s if serial killers always answer right and normal people always give wrong answers? Then P(R | not M) = 0 and P(M | R) = 1, so our model works in extreme cases too.</p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-6007596.xml</wfw:commentRss></item><item><title>Bayesian approach: Lorenzo von Matterhorn</title><category>bayesian</category><category>fun</category><category>math</category><dc:creator>hr0nix</dc:creator><pubDate>Fri, 27 Nov 2009 11:43:59 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2009/11/27/bayesian-approach-lorenzo-von-matterhorn.html</link><guid isPermaLink="false">376081:4052606:5929469</guid><description><![CDATA[<p>Today we will consider a more sophisticated inference example, with both closed-form solution and Infer.NET program. We are going to find answers to some very important questions closely connected to the <a href="http://www.youtube.com/watch?v=XgPWFPJmqiw&amp;feature=related">Lorenzo von Matterhorn</a> trick from <a href="http://en.wikipedia.org/wiki/Barney_Stinson">Barney Stinson</a>&#8217;s <a href="http://www.youtube.com/watch?v=TV-NhfgoA7A&amp;feature=PlayList&amp;p=083956957E539DDF&amp;index=0">Playbook</a>.</p>
<p>First of all, let&rsquo;s figure out factors influencing the successful completion of the trick. It looks reasonable that girl should search for your imaginary name in Google. Without it, trick wouldn&rsquo;t work. It means that girl should have some internet device and area should be covered with internet (WiFi, 3G etc). Also, if your ugliness outweigh your imaginary wealth and fame, girl may not come with you (nevertheless, it&rsquo;s highly improbable). Such suggestions allows us to build the following probabilistic model with discrete variables:&nbsp;</p>
<p style="text-align: center;"><span class="full-image-block ssNonEditable"><span><img style="width: 420px;" src="http://sexdrugsandappliedscience.com/storage/post-images/Network.png?__SQUARESPACE_CACHEVERSION=1260610548119" alt="" /></span></span></p>
<p>In fact our model is a <a href="http://en.wikipedia.org/wiki/Bayesian_network">Bayesian network</a>.&nbsp;It means that we should specify parent-conditional probability distribution for each variable it consists of. Let&#8217;s get started.</p>
<p>P(I) = 0.8 because almost all the modern phones have at least GPRS support.<br />P(C) = 0.9 because internet is available practically everywhere nowadays.<br />P(G | I, C) = 0.95 (wouldn&#8217;t you google for this strange man?)<br />P(G | not I, C) = 0.3 cause she can ask someone who has internet to google.<br />P(G | I, not C) = P(G | not I, not C) = 0 cause nobody can google without internet.<br />P(H) = 0.2 because not much guys are handsome.<br />P(S | G, H) = 0.99<br />P(S | not G, H) = 0.5 (it&#8217;s nice to be attractive, huh)<br />P(S | G, not H) = 0.9 cause wealth is more important than handsome look.<br />P(S | not G, not H) = 1e-5 (the worst case).</p>
<p>Computing probability of success is quite straightforward, so we will concentrate on two little more sophisticated questions:</p>
<ol>
<li>What&#8217;s the probability of you being handsome if you&#8217;ve succeeded in the trick?&nbsp;</li>
<li>If you haven&#8217;t succeeded in the trick, what&#8217;s the probability she had looked for your imaginary name in Google?</li>
</ol>
<p>To compute those probabilities, we&#8217;ll use <a href="http://en.wikipedia.org/wiki/Bayes_theorem">Bayes theorem</a> together with the <a href="http://en.wikipedia.org/wiki/Law_of_total_probability">law of alternatives</a>. This blog is not really a comfortable place for a lot of formulas, so the whole inference is available in the <a href="http://sexdrugsandappliedscience.com/storage/research-stuff/papers/Lorenzo.pdf">separate PDF</a>. As we can see there, it&#8217;s not very hard to show that P(H | S) = 0.2449&nbsp;and P(G | not S) = 0.2042.&nbsp;We can also write a small <a href="http://pastebin.com/f2bf67f09">inference program</a> using Infer.NET library which will give us exactly the same results.</p>
<p>Calculated probabilities agree with our common sense very well. Value of P(H | S)&nbsp;shows that it&#8217;s not necessary to be some handsome guy to perform a trick. And value of P(G | not S) tells us that most of the failures happen when girl refuses to google for some reason.</p>
<p>That was another example of probabilistic logic in action. Next time we&#8217;ll consider some continuos case like linear regression.</p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-5929469.xml</wfw:commentRss></item><item><title>Revolt of the C++ haters</title><category>fun</category><category>programming</category><dc:creator>hr0nix</dc:creator><pubDate>Thu, 19 Nov 2009 12:36:32 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2009/11/19/revolt-of-the-c-haters.html</link><guid isPermaLink="false">376081:4052606:5849495</guid><description><![CDATA[<p>Probably, some of you have heard that GCC and Visual Studio C++ Compiler are written in C++. I guess that most of the C++ compilers are (if not, my plan will fail). So, if all the compiler binaries will somehow disappear someday, all the C++ sources (including sources of the compiler itself) will become useless. But how can we achieve that?</p>
<p>The most simple (but also the most hard) way is to add some kind of time bomb to the sources of every C++&nbsp;compiler. That time-bomb should be activated a few years later, when everyone will be using compromised version of the compiler. After the activation of the time-bomb all the compiler executables will be destroyed on the first compilation. Nice scenario, isn&#8217;t it? The problem is to add time-bomb in a way that nobody in the team of compiler developers will notice it. An interesting problem for a great hacker, I guess. Man, if you do it, you&#8217;ll become a legend!</p>
<p>There is an alternative option. Not so powerful, but still possible. Every virus maker <em>who cares</em> should add <em>search-for-compilers-and-delete-them-when-time-bomb-is-activated</em> code to her (or his) brainchild. The problem here is that there are not much computer viruses for *nix systems, but most of the C++ compiler instances are concentrated there. Anyway, people, we should try!</p>
<p>I propose the following date for the time-bomb activation: December 12, 2012. Develop your viruses and keep it quiet.</p>
<p>Btw, it also can be a plot for a great action movie. Imagine some well-secured subterranean storage for&nbsp;<em>the Last Compiler</em>, a lot of men with big guns, and a bunch of heroes trying to fix the men&#8217;s biggest mistake: C++ language invention.</p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-5849495.xml</wfw:commentRss></item><item><title>Bayesian approach: introduction</title><category>bayesian</category><category>math</category><category>thoughts</category><dc:creator>hr0nix</dc:creator><pubDate>Tue, 10 Nov 2009 09:27:19 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2009/11/10/bayesian-approach-introduction.html</link><guid isPermaLink="false">376081:4052606:5751838</guid><description><![CDATA[<p style="text-align: left;">As I&#8217;ve already said, I&#8217;m going to write a few posts about Bayesian approach to probability theory and, especially, to statistical machine learning. Someday I&#8217;m going to be an expert in this topic, but, currently, I&#8217;m far from it :) So, the following series of posts is my attempt to make things clear for myself and, probably, for someone just starting to learn this amazing topic. In fact, it means that there can be some mistakes. And if you find them, I&#8217;ll be glad you reveal that as soon as possible.</p>
<h3>What&#8217;s the probability?</h3>
<p>First of all, let&#8217;s see how do Bayesianists treat probability. In the Bayesian approach to probability theory, probability is &#8220;a measure of a state of knowledge&#8221;. It means that probability expresses one&#8217;s knowledge about value of some variable, but not the stochastic nature of the variable itself. In frequentist approach, instead, probability is an &#8220;objective uncertaincy&#8221; that arises from the nature of the experiment. Nevertheless, there aren&#8217;t much uncertain&nbsp;things in our world. The one who knows all the forces acting on the coin being flipped can accurately predict the outcome in experiment without involving any probabilities. The only &#8220;objective uncertaincy&#8221; known to me is the one that arises in the quantum mechanics, but, need to mention, we may discover some hidden factors influencing it someday.</p>
<h3>Knowledge expression via random variables</h3>
<p>Ok, probability is &#8220;a measure of a state of knowledge&#8221;. But how can we express our knowledge using it? Let&#8217;s consider a few simple examples.</p>
<p>Let&#8217;s assume we have some real number, A, and we don&#8217;t know it exact value, but somehow we know that it can take values only in range between 0 and 1. Then in Bayesian setting we can treat A as a random value with [0,1]-uniform distribution. Why is the distribution uniform? That&#8217;s because the only thing we know about A&nbsp;is its range and no other knowledge is available, so we don&#8217;t prefer some values in that range more than others.</p>
<p>Another example. Imagine that we have real number B, which exact value is almost known. We are pretty sure it equals to four, or at least is very close to it. How can we express that knowledge? Well, we can say that random value B&nbsp;has Gaussian distribution with mean=4 and very small standard deviation. In that case deviations of B&nbsp;from mean are possible, but highly improbable.</p>
<h3>Prior and posterior knowledge</h3>
<p>The most important formula in the Bayesian probability theory is, of course, the <a href="http://en.wikipedia.org/wiki/Bayes'_formula">Bayes&#8217; law</a>. It shows us how our <em>prior</em> knowledge about A changes after observing B and becomes <em>posterior</em> knowledge. Bayes&#8217; formula has rather simple form:</p>
<p style="text-align: center;"><span class="full-image-inline ssNonEditable"><span><img src="http://upload.wikimedia.org/math/6/b/c/6bce478cdca0db60def5bf6059404c90.png?__SQUARESPACE_CACHEVERSION=1257870286190" alt="" /></span></span></p>
<p>The first term in numerator is so-called likelihood function, which connects observation with unknown value. Actually, it shows how probable it is to observe given&nbsp;B for some value of A.&nbsp;The second&nbsp;term, P(A), &nbsp;represents our prior knowledge about A, all the stuff we know about it regardless of any observations. Denominator does not depend on A, it&#8217;s just a constant for normalizing posterior knowledge cause we want it to be probability too. Result, P(A|B), is a conditional probability of A given some value of B. In terms of knowledge it is the answer to the question &#8220;What probability distribution represents the knowledge about&nbsp;A after we&#8217;ve learned the value of B?&#8221;</p>
<h3>What can we do in Bayesian setting?</h3>
<p>So, what&#8217;s it all about? Why do we need probabilities to encode our knowledge? What&#8217;s the profit? The profit is that we can infer knowledge using Bayes theorem. More formally, we can solve the following problem: &#8220;Given a set of observations B, related to unknown A with a likelihood function P(B|A), found A*, an estimate of A&#8221;. Common statistical approach there is using <a href="http://en.wikipedia.org/wiki/Maximum_likelihood_estimation">maximum likelihood</a> approach: A* = argmax P(B|A), which simply finds the value of A for which the likelihood value is maximal. While this approach is <a href="http://en.wikipedia.org/wiki/Efficiency_(statistics)">asymptotically&nbsp;efficient</a>, it works well only when B contains a lot of observations. Nevertheless, if we don&#8217;t have much obserations in B, but, instead, have some prior knowledge about A, we can estimate A* = argmax P(A|B) = argmax P(B|A) P(A) (<a href="http://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation">MAP estimation</a>, which is in fact maximum likelihood approach regularized with prior knowledge) or even A* = EA, where expectation is taken with respect to conditional distribution P(A|B) (<a href="http://en.wikipedia.org/wiki/Bayes_estimator">Bayes estimator</a>). When the size of B is small, Bayesian approach works much better then ML estimation. It works even when there are no observations at all (in that case, all the knowledge is just taken from prior distribution P(A))! In the same time, in cases when B contains a lot of information, Bayesian estimation works as good, as the ML approach and it&#8217;s also asymptotically efficient.</p>
<h3>Bayesian approach and boolean logic</h3>
<p>Bayesian approach can be viewed as an extension of a boolean logic for cases when we are not certain if some statement is true or not. More detailed explaination can be found, for example, in the &#8220;<a href="http://bayes.wustl.edu/etj/prob/book.pdf">Probability Theory: The Logic of Science</a>&#8221; book by Thompson Jaynes. We will consider, instead, a few intuitive examples:</p>
<p><span style="text-decoration: underline;">Boolean logic</span>: A&amp;B is true, B is true. What&#8217;s the value of A? Well, A is also true.<br /><span style="text-decoration: underline;">Probabilistic logic</span>: P(A, B) = 1. Then P(A | B) = P(A, B) / P(B) = P(A, B) / (P(A, B) + P(not A, B)) = 1 / (1 + 0) = 1.</p>
<p><span style="text-decoration: underline;">Boolean logic</span>: A-&gt;B. B is true. What&#8217;s the value of A? Oh, we don&#8217;t know. There is uncertaincy!<br /><span style="text-decoration: underline;">Probabilistic logic</span>: P(B | A) = 1. P(A | B) = P(B | A) P(A) / P(B) = P(A) / P(B) = P(A) / (P(A) + P(not A) P(B | not A)).</p>
<p>In the first example probability theory just draws&nbsp;the same conclusions about value of A, as boolean logic do. Simple and obvious, isn&#8217;t it? Let&#8217;s then look at the second example. In it, classic boolean logic tells us nothing about value of A even if we know for sure that B is true. But in Bayesian setting it&#8217;s a just a posterior knowledge inference with a known likelihood. The answer we&#8217;ve got is &#8220;Value of B depends on prior distributions of A and B&#8221;. If we observe B=true four times more frequently than A=true, or, in Bayesian treatment, we are four times more sure that B is true than that A is true, then P(A | B) = 0.25 and we are almost sure A is false. It&#8217;s very natural because if we see B more often than A, it means that B does not always follow from A and, so, when we see B, A can still be false.</p>
<p>I want you to feel that probabilistic logic is much more powerfull than classical boolean logic, especially when reasoning about real world, where almost everything is not fully known, but a lot of prior knowledge is available.</p>
<h3>Bayesian approach and Occam&#8217;s razor</h3>
<p>One of the important philosophical principles, on which all the modern science rely, is the <a href="http://en.wikipedia.org/wiki/Occam's_razor">Occam&#8217;s razor</a>. In latin it&#8217;s phrased as &#8220;<em>entia non sunt multiplicanda praeter necessitatem&#8221;<span style="font-style: normal;">, which means &#8220;entities must not be multiplied beyond necessity&#8221;, and, so, the most simple explanation is always preferrable when alternatives are available.&nbsp;How is it related to the Bayesian approach?</span></em></p>
<p><em><span style="font-style: normal;">First, let&#8217;s look at the Bayes&#8217; formula again. Assume that&nbsp;M is some&nbsp;model of D, observed data. Then P(M|D) is a knowledge about what&nbsp;model is prefferable when&nbsp;D is observed. Using Bayes&#8217; formula, P(M|D) can be decomposed into normalizing constant, likelihood term, which expresses how&nbsp;well&nbsp;model M&nbsp;fits to observed data, and also P(M), prior knowledge&nbsp;about M. If P(M) gives more probability to the more simple model, than among all the models having the same level of fitting to data, the most simple one will be selected.</span></em></p>
<p><em><span style="font-style: normal;">But that&#8217;s not all. Assume that you have two models with equal prior probabilities, say, M1 and M2. When do we select M1 then? It&#8217;s quite reasonable to select M1 when P(D|M1) &gt; P(D|M2) and to select M2 in all the other cases. Next, if M1&nbsp;represents more simple model&nbsp;than M2, then M1 can explain less possible data than A2 can&nbsp;because A2 is more flexible. But it also means that P(D|A1)<br />&nbsp;is greater for well-explained data. That&#8217;s why we&#8217;ll select more simple, constrained model even when prior probabilities of models are equal.&nbsp;Was it unclear?&nbsp;Look at the picture I&#8217;ve taken <a href="http://alumni.media.mit.edu/~tpminka/statlearn/demo/">here</a>:</span></em></p>
<p><em></em><em></em>&nbsp;</p>
<p style="text-align: center;"><img src="http://alumni.media.mit.edu/~tpminka/statlearn/demo/occam.gif?__SQUARESPACE_CACHEVERSION=1257886238514" alt="" /></p>
<p style="TEXT-ALIGN: left">&nbsp;</p>
<h3><span>Conclusion</span></h3>
<p>I hope I&#8217;ve interested you in Bayesian approach even a little a bit. More posts will follow. What do you want to hear about? That&#8217;s what I have in plans:</p>
<ul>
<li>An example of a more sophisticated inference in some simple bayesian network, must be related to hot chicks somehow :)</li>
<li>Linear regression in Bayesian setting. Regularization.</li>
<li>Automatic relevance determination.</li>
<li>Infer.NET examples for everything</li>
</ul>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-5751838.xml</wfw:commentRss></item><item><title>Treasury</title><category>blog</category><dc:creator>hr0nix</dc:creator><pubDate>Sun, 08 Nov 2009 14:19:57 +0000</pubDate><link>http://sexdrugsandappliedscience.com/blog/2009/11/8/treasury.html</link><guid isPermaLink="false">376081:4052606:5735374</guid><description><![CDATA[<p>I&#8217;ve created the <a href="http://sexdrugsandappliedscience.com/treasury/">Treasury</a> page where I will collect small descriptions (together with links) of different interesting things such as&nbsp;must-read books, helpful web-resources, great educational content and so on. There is not much stuff in the treasury now, but I&#8217;ll fill it in time, I swear =) Sensible suggestions are accepted.</p>
<p>Currently I&#8217;m thinking about series of posts related to Bayesian approach to probability theory and, especially, to statistical learning. <a href="http://research.microsoft.com/en-us/um/cambridge/projects/infernet/">Infer.NET</a> examples will be involved for sure.</p>
]]></description><wfw:commentRss>http://sexdrugsandappliedscience.com/blog/rss-comments-entry-5735374.xml</wfw:commentRss></item></channel></rss>