Powered by Squarespace
This form does not yet contain any fields.
    « NVidia CUDA certificate | Main | The story of skin detection »
    Thursday
    Jun182009

    Genetic weak learner and classifier trees

    I have some free time now (most of my friends went to the student military camp), so I decided to continue my last research work.

    If you have read my paper about accelerating boosting with genetic algorithm, you have an idea of what the genetic weak learner is. In the original work that learner was only applied to the families of parametric classifiers that can be represented as points in multidimensional space. Proposed approach can not be used to build an additive model consisting of single-feature stump classifiers in cases when features have no good parametric representation. For example, if feature can be represented only by it’s number in the feature pool and there is no corellation between features with close numbers, genetic algorithm used in the weak learner simply turns into random search. Another bad thing about stump classifier is that it can’t represent dependencies between variables which is always an important feature of the learning algorithm.

    Now I’m trying to fix some problems of the original approach. Here are some ideas I’ve recently came to:

    1. Boosting in fact is a stagewise additive modelling algorithm. So, on each iteration it tries to fit the tree that, if added to the current model, will minimize some loss function. It’s an optimization problem with tree as a function parameter. There exist some well-known approaches for solving such optimization problem with genetic algorithm: GP and GEP.
    2. What’s bad for the stump can be not so bad for the tree with more than one split. When mutation changes feature used for the split in some node, all the features in another nodes remain the same and, so, there is a strong corellation between chromosome states before and after the mutation.
    3. If non-parametric feature usage become possible, I will compare my approach with some popular tree boosting approaches such as gradient boosting on different UCI datasets.
    4. As I already have mentioned in the paper, genetic learner computational complexity depends only on genetic algorithm parameters (such as population size or epoch count) and, therefore, genetic tree boosting algorithm also has an opportunity to significantly outperform existing approaches in learning time.

    Also I’m going to remember my API design skills and implement a library for boosting both tree-based classifiers and tree-based regressors in some convinient way. Keep reading the blog to know more =)

     

    Reader Comments

    There are no comments for this journal entry. To create a new comment, use the form below.

    PostPost a New Comment

    Enter your information below to add a new comment.

    My response is on my own website »
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>