Powered by Squarespace
This form does not yet contain any fields.

    Entries in programming (10)

    Tuesday
    May182010

    An Introduction to MapReduce

    Today I’m going to write about MapReduce. It’s quite popular topic these days, and it’s the one I’m particularly interested in.

    Click to read more ...

    Thursday
    Nov192009

    Revolt of the C++ haters

    Probably, some of you have heard that GCC and Visual Studio C++ Compiler are written in C++. I guess that most of the C++ compilers are (if not, my plan will fail). So, if all the compiler binaries will somehow disappear someday, all the C++ sources (including sources of the compiler itself) will become useless. But how can we achieve that?

    The most simple (but also the most hard) way is to add some kind of time bomb to the sources of every C++ compiler. That time-bomb should be activated a few years later, when everyone will be using compromised version of the compiler. After the activation of the time-bomb all the compiler executables will be destroyed on the first compilation. Nice scenario, isn’t it? The problem is to add time-bomb in a way that nobody in the team of compiler developers will notice it. An interesting problem for a great hacker, I guess. Man, if you do it, you’ll become a legend!

    There is an alternative option. Not so powerful, but still possible. Every virus maker who cares should add search-for-compilers-and-delete-them-when-time-bomb-is-activated code to her (or his) brainchild. The problem here is that there are not much computer viruses for *nix systems, but most of the C++ compiler instances are concentrated there. Anyway, people, we should try!

    I propose the following date for the time-bomb activation: December 12, 2012. Develop your viruses and keep it quiet.

    Btw, it also can be a plot for a great action movie. Imagine some well-secured subterranean storage for the Last Compiler, a lot of men with big guns, and a bunch of heroes trying to fix the men’s biggest mistake: C++ language invention.

    Wednesday
    Jul292009

    Two simple but funny C++ puzzles

    You are given 2 pieces of code:

    void func()
    {
      variable = 666;
    }
    
    int variable;
    
    void func()
    {
      variable = 666;
    }
    

    Make them compile. You can add any C++ constructs after and before the whole code block, except for

    1. Anything related with preprocessor.
    2. Comment symbols.
    3. Digraphs or trigraphs.
    4. Any constructs with “variable” text inside.

    In the first puzzle function must access variable “variable” declared below, not any other.

    Click to read more ...

    Thursday
    Jul092009

    Towards C# vs C++ performance

    Have you read my post about LINQ performance? Well, are you interested in how quick is the C++ code that perform exactly the same thing? You’ll get the answer right now:

    C++ mean: 0.500 var: 0.0833 time: 3.962 sec

    So, again, why C# is slow? =)

    UPDATE (July 31, 2009):

    I’m just a stupid guy who has accidentally (I swear!) passed really huge array parameter by value (!) twice (!!!). Fixing myself:

    C++ mean: 0.500 var: 0.0833 time: 1.888 sec

    I have another question now. Why the hell is C# code doing exactly the same thing (in a way that seems effective to me) so slow compared to its C++ analogue? Is it JIT compilation overhead for those two functions? Nope. Checked it. Probably, JIT optimizer sucks? But what can C++ optimizer do what .NET JIT can not in the case of simple sequential access? Maybe access through enumerator is less effective than indexing? And here we go:

    Indexing: mean=0,5000 var=0,0833 time=00:00:00.7956019

    It’s more than two times faster than the C++ version! And more than 3 times faster than the one that uses enumerators in C#! Currently looking for explanation…

    UPDATE 2 (July 31, 2009):

    Well, .NET JITer is not as smart as I think it is. It looks like foreach applied to array actually executes all that enumerator-related stuff while the most simple for loop is heavily optimized.

    Wednesday
    Jul082009

    LINQ

    I really like LINQ feature in .NET. For me it’s what people call syntactic sugar. It allows me to reduce amount of code I’m writing dramatically in many cases. Look at the following example - http://pastebin.com/d10c04029

    This code estimates mean and variance of distribution given a sequence of samples. Two versions are provided. The first one uses .NET 2.0 features only and the second uses LINQ. As you can see, LINQ version is very short and awesome-looking even for this very simple example.

    Are there any drawbacks? Yeah, unfortunately. And, of course, that drawbacks are performance related. Let’s compare time required for processing sequence (System.Array of double) containing 50 million samples (drawn from [0,1]-uniform distribution if that matter something for someone):

    Normal: mean=0,5000 var=0,0833 time=00:00:03.6591708
    LINQ: mean=0,5000 var=0,0833 time=00:00:04.5832141

    Why is the LINQ version 1.5 times slower? Well, the obvious answer is that it should call a function (which is not inlined) for every member of the sequence, at least while calculating variance, and the first version should not. And, probably, computational cost of a function call is comparable to a few math operations (as it should be). So, use LINQ as a syntactic sugar only when the function you want to apply to the sequence elements is much more complex than a few additions and multiplications or when your collection is small and you just don’t care about processing time.

    Read more about LINQ performance there - http://davepeck.org/linq-collection-performance/.

    Strange observation: when passing List<double> instead of array, LINQ version takes about 6 seconds while performance of the normal version does not change.

    Thursday
    Jun252009

    Simple K-fold cross-validation tool for MATLAB

    MATLAB has a function for cross-validation, crossvalind. But it’s not very useful when you need to perform many experiments for different learning algorithms. That’s why I’ve implemented little tool for performing k-fold cross-validation experiments. Each experiment run can be parametrized with any learning algorithm and loss function. All you need is to wrap all your learning stuff in a bunch of simple MATLAB functions and then pass those functions as a parameters to the CrossValidation function.

    Currently there are provided a few wrappers for the classification and regression algorithms presented in MATLAB: decision trees, SVM, KNN and generalized linear models. Three loss function are also provided: RMSE, MAD and misclassification loss.

    Currenlty I am using this tool very intensively at work, so new versions with more algorithms, loss functions and features are likely to appear.

     Tool is available here.

    Sunday
    May172009

    RoboZZle solver update

    I’ve just updated my RoboZZle solver. There are two major changes:

    • I’ve fixed bug with program execution loop detection using approach proposed by Igor Ostrovsky there. So it can solve “Count the tiles“-like puzzles now.
    • I’ve added the whole new solving approach based onto evolutionary algorithm (implemented using AForge.NET library). It is not very useful just for now, but still can solve some complicated puzzles. I’ve used eaten star count as a fitness function for program. So, if there is only one or two stars on the field, this approach works just like random search. But if you know correct path for your robot (it is obvious in many puzzles), you can mark it all with stars using field editor integrated in the solver, giving your guess to the evolutionary algorithm. The main problem with this approach is that in many puzzles correct program is very different from the program that can collect almost all the stars on the field. I need better fitness function, and probably, some other crossover and mutation ops. Currently chromosome is just a concatenation of all the program functions’ code. Mutation operator changes color or operation at random position of the chromosome. 2-point crossover exchange operations between two chromosomes.

    Btw, I will be really happy if you add a comment to this post describing what puzzles can (and can not) be solved using brute-force or evolutionary approach. That information will help me to improve my solver a lot.

    Thursday
    Mar052009

    RoboZZle - part 2

    Due to interest in my RoboZZle puzzle solver of project developer Igor Ostrovsky (who by the way has a blog with tons of really interesting stuff) I’ve decided that it would be interesting to continue development. The first thing I made was fixing small nasty bugs and improving solver UI. Then I’ve created project page on google code. Now anyone (who wants) can participate in developing this amazing (as I think) application.

    Currently solver has at least one unsolved algorithmic problem with detecting infinite loops when testing generated problem. Program state history tracking with state hash set is used in current implementation. State is represented as a vector of five numbers S=(X, Y, Direction, Function, Instruction). When solver detects that current execution state has been seen before, it decides that program had entered into infinite loop and interrupts it. This algorithm detects loops very fast but, unfortunately, sometimes it can interrupt a correct program. For example, in the problem “Count the tiles” robot should rotate consequently five times on the one of the red tiles. And the state before the fifth rotation is exactly the same as before the first one. That’s why my solver can’t solve that task for now. I should think about implementing more correct but still efficient way of execution loop detection. Any suggestions are welcome!

    We have also plans to make more handy puzzle image recognition. Screenshot should be accurately cropped in your favorite image editor first when using current implementation. That can’t be called “comfortable”. Ability to recognize robot position and orientation will be added too.

    Everyone who just wants to see the solver in action can download binaries here (.NET 3.5 required).

    Updated solver UI:

    Sunday
    Mar012009

    RoboZZle

    As every programmer or mathematician, I like problems and puzzles a lot. I can’t say I’m really-really good on it, but I’ve achieved some success. One of my favorite problem databases is Project Euler. Problems presented there are related both to math and programming. And it’s really nice that for solving less or more complex problems there you should be familiar with both disciplines. So if you have some free time, I suggest you try it. It can help you with keeping your mind clear and bright. Other good thing about it is that new problem is added to the database every week. So it’s non-stop process of self-perfection.

    Recently my friend Ilya has shown me another puzzle-like project which is completely different from Project Euler. It’s called RoboZZle. Each problem consists of a field colored with three (or less) different colors with stars placed on it and little robot. You should write programs for that robot to help it collect all the stars. Robot command language is very simple: there are only rotation and step commands. Complexity comes with the fact that subroutines and recursion are allowed, but number of subroutines and instructions in each subroutine is limited. Puzzles there differ from very simple that can be solved in 30 seconds to really complex taking about an hour.

    After solving near 40 puzzles I though that it would be really fun to make an automatic solver for RoboZZle. I’ve coded simple algorithm that uses backtracking to generate different programs of specified length and check if generated program is a solution to puzzle. After few simple optimizations made (like no right rotation followed by left or no three consequent rotations in the same direction) it has shown really good speed. I’ve solved then about ten puzzles from Moderate and Hard categories using my solver. To simplify puzzle input process my friend Ilya wrote a simple program converting screenshot of a puzzle from web-site into textual description.
    Currently my solver can’t solve problems with more than 9 free slots available and all the three colors presented on the field because of huge size of program space. I hope I’ll improve it later in some ways and put on a google code.



    Btw, Windows 7 is a great piece of software.

    Wednesday
    Feb112009

    C++ vs C#

    I totally hate C++. And every day my hate grows up. But that is not just about my programming language religion. We should look deeper.

    I know C++ pretty well. I’ve read all the stuff like Effective C++, More Effective C++ and Modern C++ Design and understand all of it. And it was really hard. Now I can do amazing things with templates and write robust error-free code. Do I want it? No. It is still really hard for me (and for everyone, I guess) to write, read and maintain such a code. That snippet looks really ugly in C++:
    for (std::vector<double>::iterator it = values.begin();
    it != values.end();
    ++it)
    DoStuff(*it);
    And here is it’s C# equivalent:
    foreach (double value in values)
    DoStuff(value);
    And I’m pretty sure (I have some experience, you know) that 99% of equivalent code pieces are much more simple in C# than in C++. When you read and maintain a lot of code, it is very important.
    Next, I hate includes. There is no good reason for good programming language to have such a non-obvious and non-robust way to incorporate different subsystems or just pieces of code. Really! IDE can do it for me! In .NET there are no problems with include paths. And I can just reference some library to get access to it’s components. No precompiled headers, no include (or lib) directory setup. That’s all about rapid and easy development.
    And the last thing that’s important for me: in most cases (except of writting really performance-sensitive code) you should not care about memory deallocation at all. Execution environment can do it for you. Rather quickly. And it does. How much errors have you ever had with memory allocation and deallocation in C++? I have a lot. More than any other kinds of errors. It’s like a pain.

    But, of course, C++ is fast. Really fast. In computer vision and 3D rendering there are a lot of very performance-sensitive stuff. And the only choise you have is to code it in C++. But don’t forget: there are wrappers for everything. There is XNA, Emgu.CV and CUDA.NET. There is much more than that. You can still write performance-sensitive code in a managed environment with all it’s benefits.

    I forget to mention wonderful tools for .NET developer such a Resharper, FxCop, StyleCop and many others. There are no technical possibilities to create such a useful and powerful tool for a non-managed language like C++.

    Let’s sum up:
    .NET (C#)
    1. Easy language to code, read and maintain.
    2. No problems with library version control, includes and stuff.
    3. Fast and robust garbage collection.
    4. Very powerful standart library.
    5. Cool IDE and tools.
    6. Quite fast.
    C++
    1. Really fast.
    2. There is Boost.
    So that’s my proposal: when you want to code something, think about implementing it in C# first and consider C++ as the last option. I’m pretty sure you’ll like it as much as I do.