## Introduction

The Naive Bayes algorithm is based on conditional probabilities. It uses Bayes' Theorem, a formula that calculates a probability by counting the frequency of values and combinations of values in the historical data.
Bayes' Theorem finds the probability of an event occurring given the probability of another event that has already occurred. If *B* represents the dependent event and *A* represents the prior event, Bayes' theorem can be stated as follows. To calculate the probability of *B* given *A*, the algorithm counts the number of cases where *A* and *B* occur together and divides it by the number of cases where *A* occurs alone.

## Implementation

Scikit-learn provides implementation of Naïve Bayes algorithm of 3 flavors: *MultinomialNB* implementing the naive Bayes algorithm for multinomially distributed data; *GaussianNB* implementing the Gaussian Naive Bayes algorithm for classification; and *BernoulliNB* implements the naive Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions.

Let's take a look at Naïve Bayes algorithm at work classifying Iris data and since anything the nature produces is distributed according to a Gaussian distribution, we'll be using this appropriate class

import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_iris from sklearn.naive_bayes import GaussianNB # Parameters n_classes = 3 plot_colors = "bry" plot_step = 0.02 plt.rcParams["figure.figsize"] = [12, 8] # Load data iris = load_iris() for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3], [1, 2], [1, 3], [2, 3]]): # We only take the two corresponding features X = iris.data[:, pair] y = iris.target # Shuffle idx = np.arange(X.shape[0]) np.random.seed(13) np.random.shuffle(idx) X = X[idx] y = y[idx] # Train clf = GaussianNB().fit(X, y) # Plot the decision boundary plt.subplot(2, 3, pairidx + 1) x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step), np.arange(y_min, y_max, plot_step)) Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) cs = plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired) plt.xlabel(iris.feature_names[pair[0]]) plt.ylabel(iris.feature_names[pair[1]]) plt.axis() # Plot the training points for i, color in zip(range(n_classes), plot_colors): idx = np.where(y == i) plt.scatter(X[idx, 0], X[idx, 1], c=color, label=iris.target_names[i], cmap=plt.cm.Paired) plt.axis() plt.legend(loc="upper left") plt.show()

Pretty cool, isn't it!

## Conclusion

The Naive Bayes algorithm affords fast, highly scalable model building and scoring. It scales linearly with the number of predictors and rows. You'll need, however, a big data set in order to make reliable estimations of the probability of each class. You can use Naïve Bayes classification algorithm with a small data set, but precision and recall will keep very low. For small reminder about what those are, have a look at performance metrics section here.