Saturday, July 24, 2010

Naive bayes classifier for discrete predictors

The naive bayes approach is a supervised learning method which is based on a simplistic hypothesis: it assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. Yet, despite this, it appears robust and efficient. Its performance is comparable to other supervised learning techniques.

We introduce in Tanagra (version 1.4.36 and later) a new presentation of the results of the learning process. The classifier is easier to understand, and its deployment is also made easier.

In the first part of this tutorial, we present some theoretical aspects of the naive bayes classifier. Then, we implement the approach on a dataset with Tanagra. We compare the obtained results (the parameters of the model) to those obtained with other linear approaches such as the logistic regression, the linear discriminant analysis and the linear SVM. We note that the results are highly consistent. This largely explains the good performance of the method in comparison to others.

In the second part, we use various tools on the same dataset (Weka 3.6.0, R 2.9.2, Knime 2.1.1, Orange 2.0b and RapidMiner 4.6.0). We try above all to understand the obtained results.

Keywords: naive bayes, linear classifier, linear discriminant analysis, logistic regression, linear support vector machine, svm
Components: NAIVE BAYES, LINEAR DISCRIMINANT ANALYSIS, BINARY LOGISTIC REGRESSION, SVM, 0_1_BINARIZE
Tutorial: en_Tanagra_Naive_Bayes_Classifier_Explained.pdf
Dataset: heart_for_naive_bayes.zip
References :
Wikipedia, "Naive bayes classifier".
T. Mitchell, "Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression", in Machine Learning, Chapter 1, 2005.