Loading required package: lattice
Attaching package: 'caret'
The following object is masked from 'package:purrr':
lift
Code
library(e1071)library(caTools)library(yardstick)
For binary classification, the first factor level is assumed to be the event.
Use the argument `event_level = "second"` to alter this as needed.
Attaching package: 'yardstick'
The following objects are masked from 'package:caret':
precision, recall, sensitivity, specificity
The following object is masked from 'package:readr':
spec
Code
library(naivebayes)
naivebayes 0.9.7 loaded
Code
library(ggplot2)library(psych)
Attaching package: 'psych'
The following objects are masked from 'package:ggplot2':
%+%, alpha
Code
library(sjPlot)
Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!
Code
library(klaR)
Loading required package: MASS
Attaching package: 'MASS'
The following object is masked from 'package:dplyr':
select
Rows: 26941 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): season_round, status, constructorRef, weather, stop, label
dbl (16): season, round, driverId, raceId, circuitId, position, points, grid...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Bayes’ Theorem: In probability theory and statistics, Bayes’ theorem (alternatively Bayes’ law or Bayes’ rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes’ theorem allows the risk to an individual of a known age to be assessed more accurately (by conditioning it on their age) than simply assuming that the individual is typical of the population as a whole.
Naive Bayes Algorithm is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
A fruit might be categorized as an apple, for instance, if it is red, rounded, and around 3 inches in diameter. Even if these characteristics depend on one another or on the presence of other characteristics, each of these traits separately increases the likelihood that this fruit is an apple, which is why it is called “Naive.”
Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.
Bayes’ Theorem can be shown by this equation: \[ P(C|X) = \frac {P(X|C) * P(C)}{P(X)} \]
In the above equation:
P(C|X) is the posterior probability of class (C, target) given predictor (X, attributes).
P(C) is the prior probability of class.
P(X|C) is the likelihood which is the probability of predictor given class.
P(X) is the prior probability of predictor.
How does Bayes Theorem work?
Let’s take an example: A Path Lab is performing a Test of disease say “D” with two results “Positive” & “Negative.” They guarantee that their test result is 99% accurate: if you have the disease, they will give test positive 99% of the time. If you don’t have the disease, they will test negative 99% of the time. If 3% of all the people have this disease and test gives “positive” result, what is the probability that you actually have the disease?
For solving the above problem, we will have to use conditional probability.
Probability of people suffering from Disease D, P(D) = 0.03 = 3%
Probability that test gives “positive” result and patient have the disease, P(Pos | D) = 0.99 =99%
Probability of people not suffering from Disease D, P(~D) = 0.97 = 97%
Probability that test gives “positive” result and patient does have the disease, P(Pos | ~D) = 0.01 =1%
For calculating the probability that the patient actually have the disease i.e, P( D | Pos) we will use Bayes theorem.
Gaussian Naïve Bayes Classifier: In Gaussian Naïve Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution (Normal distribution). When plotted, it gives a bell-shaped curve which is symmetric about the mean of the feature values.
Multinomial Naïve Bayes Classifier: Feature vectors represent the frequencies with which certain events have been generated by a multinomial distribution. This is the event model typically used for document classification.
Bernoulli Naïve Bayes Classifier: In the multivariate Bernoulli event model, features are independent booleans (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence (i.e. a word occurs in a document or not) features are used rather than term frequencies (i.e. frequency of a word in the document).
Applications of Naive Bayes Algorithm:
Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making predictions in real time.
Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we can predict the probability of multiple classes of target variable.
Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification (due to better result in multi class problems and independence rule) have higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)
Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not
Advantages:
It is easy and fast to predict class of test data set. It also perform well in multi class prediction
When assumption of independence holds, a Naive Bayes classifier performs better compare to other models like logistic regression and you need less training data.
It performs well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).
Disadvatages:
If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.
On the other side naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.
Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.
Above are the density line plots for all the feature variables for all 3 label values.
Code
train_pred=predict(model,train)
Warning: predict.naive_bayes(): more features in the newdata are provided as
there are probability tables in the object. Calculation is performed based on
features to be found in the tables.
Confusion Matrix and Statistics
train_pred 1 2 3
1 2375 327 0
2 310 3468 256
3 0 2569 12242
Overall Statistics
Accuracy : 0.8393
95% CI : (0.8344, 0.8442)
No Information Rate : 0.58
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6971
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.8845 0.5449 0.9795
Specificity 0.9827 0.9627 0.7161
Pos Pred Value 0.8790 0.8597 0.8265
Neg Pred Value 0.9836 0.8346 0.9620
Prevalence 0.1246 0.2954 0.5800
Detection Rate 0.1102 0.1610 0.5682
Detection Prevalence 0.1254 0.1872 0.6874
Balanced Accuracy 0.9336 0.7538 0.8478
Code
test_pred=predict(model,test)
Warning: predict.naive_bayes(): more features in the newdata are provided as
there are probability tables in the object. Calculation is performed based on
features to be found in the tables.
Confusion Matrix and Statistics
test_pred 1 2 3
1 669 89 1
2 85 864 68
3 0 661 2957
Overall Statistics
Accuracy : 0.8324
95% CI : (0.8222, 0.8423)
No Information Rate : 0.561
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.694
Mcnemar's Test P-Value : < 2.2e-16
Statistics by Class:
Class: 1 Class: 2 Class: 3
Sensitivity 0.8873 0.5353 0.9772
Specificity 0.9806 0.9595 0.7209
Pos Pred Value 0.8814 0.8496 0.8173
Neg Pred Value 0.9817 0.8286 0.9611
Prevalence 0.1398 0.2992 0.5610
Detection Rate 0.1240 0.1602 0.5482
Detection Prevalence 0.1407 0.1885 0.6707
Balanced Accuracy 0.9339 0.7474 0.8490
We get training accuracy from our model as 83.93% and balanced accuracy as 93% for Podium, 75% for Top_10 and 84% for Outisde_Top_10. Test accuracy from our model as 83.24% and balanced accuracy as 93% for Podium, 74% for Top_10 and 85% for Outisde_Top_10. There is a not a lot of difference between train and test accuracy which means our model is not over fitted.
test_cm_df =data.frame(test_cm)colnames(test_cm_df) =c('pred', 'truth', 'y')ggplot(data = test_cm_df, mapping =aes(x = truth , y = pred)) +geom_tile(aes(fill = y), colour ="white") +labs(title ='Confusion Matrix of Test Data') +scale_x_discrete(labels=c("1"="Podium", "2"="Top_10", "3"="Outide_Top_10")) +scale_y_discrete(labels=c("1"="Podium", "2"="Top_10", "3"="Outide_Top_10")) +geom_text(aes(label =sprintf("%1.0f", y)), vjust =1, colour ='white') +# scale_fill_gradient(low = "cyan", high = "darkgoldenrod1") +theme_bw() +theme(legend.position ="none")
Source Code
---title: <b>Naive Bayes' for Record Data</b>format: html: theme: lumen toc: true self-contained: true embed-resources: true page-layout: full code-fold: true code-tools: true---```{r setup, include=FALSE}knitr::opts_chunk$set(echo =TRUE)```# Import Libraries```{r}library(tidyr)library(tidyverse)library(dplyr)library(caret)library(e1071)library(caTools)library(yardstick)library(naivebayes)library(ggplot2)library(psych)library(sjPlot)library(klaR)```# Import Data```{r}df =read_csv("../../data/02-model-data/data_cleaned.csv")head(df)```# Data Pre-Processing and Visualization```{r}barplot(table(df$label), col ='lightblue', main ='Distribution of Labels', xlab ='Labels', ylab ='Count')``````{r}plot(df$cumulative_points, df$win_history, col ='lightblue', main ='Points vs Win History', xlab ='Cumulative Points', ylab ='Win History')```Converting some numeric variables to factors:```{r}a =factor(df$status, levels =unique(df$status))df$status =as.integer(a)``````{r}a =factor(df$constructorRef, levels =unique(df$constructorRef))df$constructorRef =as.integer(a)``````{r}a =factor(df$weather, levels =unique(df$weather))df$weather =as.integer(a)``````{r}a =factor(df$stop, levels =unique(df$stop))df$stop =as.integer(a)``````{r}a =factor(df$label, levels =unique(df$label))df$label =as.integer(a)```Dropping unnecessary columns:```{r}df = df[-c(1:3)]``````{r}df$label =as.factor(df$label)df```Splitting data into train and test set:```{r}set.seed(1973)sample <-sample(c(TRUE, FALSE), nrow(df), replace=TRUE, prob=c(0.8,0.2))train <- df[sample, ]test <- df[!sample, ]``````{r}nrow(train)nrow(test)```# Naive Bayes Model- Bayes' Theorem: In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes' theorem allows the risk to an individual of a known age to be assessed more accurately (by conditioning it on their age) than simply assuming that the individual is typical of the population as a whole.- Naive Bayes Algorithm is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.- A fruit might be categorized as an apple, for instance, if it is red, rounded, and around 3 inches in diameter. Even if these characteristics depend on one another or on the presence of other characteristics, each of these traits separately increases the likelihood that this fruit is an apple, which is why it is called "Naive."- Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.- Bayes' Theorem can be shown by this equation:$$ P(C|X) = \frac {P(X|C) * P(C)}{P(X)} $$- In the above equation: - P(C|X) is the posterior probability of class (C, target) given predictor (X, attributes). - P(C) is the prior probability of class. - P(X|C) is the likelihood which is the probability of predictor given class. - P(X) is the prior probability of predictor.- How does Bayes Theorem work? - Let's take an example: A Path Lab is performing a Test of disease say “D” with two results “Positive” & “Negative.” They guarantee that their test result is 99% accurate: if you have the disease, they will give test positive 99% of the time. If you don’t have the disease, they will test negative 99% of the time. If 3% of all the people have this disease and test gives “positive” result, what is the probability that you actually have the disease? - For solving the above problem, we will have to use conditional probability. - Probability of people suffering from Disease D, P(D) = 0.03 = 3% - Probability that test gives “positive” result and patient have the disease, P(Pos | D) = 0.99 =99% - Probability of people not suffering from Disease D, P(~D) = 0.97 = 97% - Probability that test gives “positive” result and patient does have the disease, P(Pos | ~D) = 0.01 =1% - For calculating the probability that the patient actually have the disease i.e, P( D | Pos) we will use Bayes theorem. - P(Pos) = P(D, pos) + P( ~D, pos) = P(pos|D) * P(D) + P(pos|~D) * P(~D) = 0.99 * 0.03 + 0.01 * 0.97 = 0.0394 - Hence, P( D | Pos) = (P(Pos | D) * P(D)) / P(Pos) = (0.99 * 0.03) / 0.0394 = 0.753807107 - So, Approximately 75% chances are there that the patient is actually suffering from disease. - This is how Bayes' Theorem works. <ahref='https://dataaspirant.com/naive-bayes-classifier-machine-learning'> Reference </a>- Types of Naive Bayes Algorithms: 1. Gaussian Naïve Bayes Classifier: In Gaussian Naïve Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution (Normal distribution). When plotted, it gives a bell-shaped curve which is symmetric about the mean of the feature values. 2. Multinomial Naïve Bayes Classifier: Feature vectors represent the frequencies with which certain events have been generated by a multinomial distribution. This is the event model typically used for document classification. 3. Bernoulli Naïve Bayes Classifier: In the multivariate Bernoulli event model, features are independent booleans (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence (i.e. a word occurs in a document or not) features are used rather than term frequencies (i.e. frequency of a word in the document).- Applications of Naive Bayes Algorithm: 1. Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making predictions in real time. 2. Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we can predict the probability of multiple classes of target variable. 3. Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification (due to better result in multi class problems and independence rule) have higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments) 4. Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not- Advantages: - It is easy and fast to predict class of test data set. It also perform well in multi class prediction - When assumption of independence holds, a Naive Bayes classifier performs better compare to other models like logistic regression and you need less training data. - It performs well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).- Disadvatages: - If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation. - On the other side naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously. - Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.```{r}set.seed(1973)model1=NaiveBayes(label ~., data=train)model =naive_bayes(label ~., data=train)``````{r}plot(model1)```Above are the density line plots for all the feature variables for all 3 label values.```{r}train_pred=predict(model,train)train_cm =table(train_pred,train$label)confusionMatrix(train_cm)``````{r}test_pred=predict(model,test)test_cm =table(test_pred,test$label)confusionMatrix(test_cm)```We get training accuracy from our model as 83.93% and balanced accuracy as 93% for Podium, 75% for Top_10 and 84% for Outisde_Top_10.Test accuracy from our model as 83.24% and balanced accuracy as 93% for Podium, 74% for Top_10 and 85% for Outisde_Top_10. There is a not a lot of difference between train and test accuracy which means our model is not over fitted.Confusion Matrix for Train and Test Data:```{r}train_cm_df =data.frame(train_cm)colnames(train_cm_df) =c('pred', 'truth', 'y')ggplot(data = train_cm_df, mapping =aes(x = truth , y = pred)) +geom_tile(aes(fill = y), colour ="white") +labs(title ='Confusion Matrix of Train Data') +scale_x_discrete(labels=c("1"="Podium", "2"="Top_10", "3"="Outide_Top_10")) +scale_y_discrete(labels=c("1"="Podium", "2"="Top_10", "3"="Outide_Top_10")) +geom_text(aes(label =sprintf("%1.0f", y)), vjust =1, colour ='white') +#scale_fill_gradient(low = "lightblue", high = "yellow") +theme_bw() +theme(legend.position ="none")``````{r}test_cm_df =data.frame(test_cm)colnames(test_cm_df) =c('pred', 'truth', 'y')ggplot(data = test_cm_df, mapping =aes(x = truth , y = pred)) +geom_tile(aes(fill = y), colour ="white") +labs(title ='Confusion Matrix of Test Data') +scale_x_discrete(labels=c("1"="Podium", "2"="Top_10", "3"="Outide_Top_10")) +scale_y_discrete(labels=c("1"="Podium", "2"="Top_10", "3"="Outide_Top_10")) +geom_text(aes(label =sprintf("%1.0f", y)), vjust =1, colour ='white') +# scale_fill_gradient(low = "cyan", high = "darkgoldenrod1") +theme_bw() +theme(legend.position ="none")```