The story of skin detection
Thursday, June 11, 2009 at 12:49PM While working on my current project at R&D department of the BS Graphics company I’ve understand the importance of the skin detection very well. The people tracking system we’re developing uses it for two different reasons:
- It allows us to reduce amount of false positives that face detector produces. In our tracking system faces with less than a half of pixels classified as skin are simply rejected. It was the first application of the skin detection in our tracking system and it still serves us well.
- Face detection works slowly, especially when video frame is big. So, we did such a thing: when full-scale face search is running over frame, both skin and foreground masks are first calculated. Then only regions built from intersection of those masks are used for face detection. In most of the scenes that approach accelerates face detection 2 or 3 times.
But the skin detection has it’s drawbacks and limitations. The most important fact about it is the following: skin color in the frame depends on camera quality and lighting conditions a lot. So, skin classifier learned with some specific lighting conditions can perform vary bad when those lighting conditions change.
During the last few months I’ve tried 3 different approaches to skin detection, and now I’m more or less satisfied. Here’s the list of what I’ve tried. Maybe it will help someone.
- Skin detector proposed by Jones and Rehg. It uses Bayes classifier to classify pixels with given color to be skin or not. Conditional distribution learning is performed offline using a set of images with labeled skin pixels. Gaussian mixture models are used to represent conditional distribution of skin color. Mixture component parameters trained by Jones and Rehg are given in the end of the paper, so it is not necessary to train classifier by yourself. The only important thing is to convert GMM back to 3d histogram for better performance. Unfortunately, classifier is very sensitive to camera exposure and lighting changes. It is good for photos from web, but very bad for real-world scenarios.
- Adaptive version of Jones-Rehg skin detector that uses information from face detector. Conditional distribution is learned using pixels lying in the face regions. Unfortunately, other skin-colored parts (such as arms, neck etc) can get into the negative samples, therefore, corrupting color distribution. That fact gave me the following idea: adaptation should not use any negative samples. It should fit some descriptive model to the skin color data we have instead.
- Wimmer-Radig descriptive skin model. Simple decision rule is fitted to the skin color distribution (in normalized RGB). I re-learn temporary skin model in some “key” frames and interpolate final skin model between them to make model evolution continuous. That approach is the best one until now, but, probably, some heavy tests will reveal it’s problems too.
The following video shows the 3rd approach at work. Semi-transparent red mask indicates pixels classified as skin.
hr0nix |
2 Comments |
classification,
vision 

Reader Comments (2)
вообще-то я не читала пост, но видео посмотрела:) а зачет тебе на носу зеленый крестик?:)
Красные пикселы - кожа, рамочка - контуры лица, крестик - центр головы в трехмерном пространстве (только по идее), надпись "B.Yangel" сверху говорит о том, что система узнала меня по лицу.
Пиши комменты на английском, не ленись. Блог в основном читают нерусские )