This semester I took a course in Organización de Datos (Data Science) and despite of being a really exhausting course, it was by far one of the most interesting ones I had taken in my whole college years. As a part of the course, we were asked to form groups to participate in a Kaggle Competition about sentiment prediction.
Yes, indeed. That was my first reaction. Dealing with other demanding courses and having to understand a bunch of new concepts like Perceptron, SVM, Random Forest, Bayes, did took some damage to my hairline; but overall the concepts were fairly understandable. ‘Machine Learning’ as the phrase self explains, is a procedure in which a machine ‘learns’. This is done by using two sets of data, a training set and a testing set. The algorithm we used is a Perceptron Classifier
Does your person have a beard?
Before explaining in detail how Perceptron learns, Its better to start with an abstract example. Let’s suppose we are trying to teach our little cute puppy to sit, so when we say the ‘sit’ command and the puppy sits, it’s rewarded by a treat. The puppy associates these actions and hopefully from now on, when he hears the commands, he will sit expecting for a treat. Another childish example, lets suppose the riddle ‘what animal has four legs, a tail, two eyes and uses a litter-box’. Without much effort, we can guess it’s a cat! It is fairly simple, because we associate those features (legs, tail, eyes) to a cat. I’ve mentioned quite few times the word associate…see the trend?
The algorithm basically does that, it associates a set of features to a result. Have you ever played ‘Guess who?‘
If you have, you can relate some features to a person. See for example Herman, you clearly know some of his features will be bald, redhead, big nose.
Perceptron does exactly this, it learns from a set of features and given a question ‘bald, man, with glasses, readhead, not bald’ gives us a probability of that person being Herman. Meaning it is a binary classifier, it can only predict two results, yes & no. For the competition we had to predict sentiment of a movie review (positive or negative), in our case the features were the words.
Ok, but how does it ‘learn’?
First lets assume we know we will have a fixed number of features and have a set of weights of the same size, representing the weights of the features. We read through all the reviews in the learning data set (which sentiment we know) and adjust the weights accordingly. Suppose we read “A really nice movie” and it predicts a negative review, we should increase the weights of those features. On the contrary, if we read “An awful waste of time” and it predicts a positive review, we should decrease the weights of these features.
We read the training data set as many times it requires until we have no errors (wrong predictions) or it reaches an acceptable threshold. At this point the algorithm has concluded learning. The we read through the test data set, and make the predictions.
We reached a 0.96910 score in Kaggle, which means we predicted with a 97% of accuracy which put us on the 19th place (without considering the first 4 places are cheaters). Overall it s a simple algorithm which relays in simple abstractions and depending on the given problem, its results may be really good.
For further reading refer to http://mlwave.com/ which I did for the course and was really helpful.