Posts

Esquisse R Package

Image
The purpose of this add-in is to let you explore your data quickly to extract the information they hold. You can only create simple plots, you won't be able to use custom scales and all the power of ggplot2. This is just the start! This addin allows you to interactively explore your data by visualizing it with the ggplot2 package. It allows you to draw bar graphs, curves, scatter plots, histograms, then export the graph or retrieve the code generating the graph. Installation Install from CRAN with : # From CRAN install.packages( " esquisse " ) Add data & drag n drop to see visual miracle.

The Multi-Armed Bandit Problem -Thompson Sampling (Ad Campaign)

Image
# Importing the dataset dataset = read.csv('Ads_CTR_Optimisation.csv') # Implementing Thompson Sampling N = 10000 d = 10 ads_selected = integer(0) numbers_of_rewards_1 = integer(d) numbers_of_rewards_0 = integer(d) total_reward = 0 for (n in 1:N) {   ad = 0   max_random = 0   for (i in 1:d) {     random_beta = rbeta(n = 1,                         shape1 = numbers_of_rewards_1[i] + 1,                         shape2 = numbers_of_rewards_0[i] + 1)     if (random_beta > max_random) {       max_random = random_beta       ad = i     }   }   ads_selected = append(ads_selected, ad)   reward = dataset[n, ad]   if (reward == 1) {     numbers_of_rewards_1[ad] = numbers_of_rewards_1[ad] + 1   } else {     numbers_of_rewards_0[ad] = numbers_of_rewards_0[ad] + 1   }   total_reward = total_reward + reward } # Visualising the results hist(ads_selected,      col = 'blue',      main = 'Histogram of ads selections',      xlab = 'Ads&

The Multi-Armed Bandit Problem -Upper Confidence Bound (Ad Campaign)

Image
# Upper Confidence Bound # Importing the dataset dataset = read.csv('Ads_CTR_Optimisation.csv') # Implementing UCB N = 10000 d = 10 ads_selected = integer(0) #d is vector has only zero's numbers_of_selections = integer(d) sums_of_rewards = integer(d) total_reward = 0 for (n in 1:N) {   ad = 0   max_upper_bound = 0   for (i in 1:d) {     if (numbers_of_selections[i] > 0) {       average_reward = sums_of_rewards[i] / numbers_of_selections[i]       #upper confidence limit       delta_i = sqrt(3/2 * log(n) / numbers_of_selections[i])       upper_bound = average_reward + delta_i     } else {         upper_bound = 1e400     }     if (upper_bound > max_upper_bound) {       max_upper_bound = upper_bound       ad = i     }   }   ads_selected = append(ads_selected, ad)   numbers_of_selections[ad] = numbers_of_selections[ad] + 1   reward = dataset[n, ad]   sums_of_rewards[ad] = sums_of_rewards[ad] + reward   total_reward = total_reward + rewar

Eclat - A strong & stylish effect

Image
# Data Preprocessing # install.packages('arules') library(arules) dataset = read.csv('Market_Basket_Optimisation.csv') dataset = read.transactions('Market_Basket_Optimisation.csv', sep = ',', rm.duplicates = TRUE) Output distribution of transactions with duplicates: 1 5 summary(dataset) transactions as itemMatrix in sparse format with  7501 rows (elements/itemsets/transactions) and  119 columns (items) and a density of 0.03288973 most frequent items: mineral water          eggs     spaghetti  french fries     chocolate       (Other)          1788          1348          1306          1282          1229         22405 element (itemset/transaction) length distribution: sizes    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   18 1754 1358 1044  816  667  493  391  324  259  139  102   67   40   22   17    4    1   19   20    2    1    Min. 1st Qu.  Median    Mean 3rd Qu. 

Apriori

Image
# Data Preprocessing # install.packages('arules') library(arules) dataset = read.csv('Market_Basket_Optimisation.csv', header = FALSE) #sparce matrix dataset = read.transactions('Market_Basket_Optimisation.csv', sep = ',', rm.duplicates = TRUE) distribution of transactions with duplicates: 1  5  summary(dataset) transactions as itemMatrix in sparse format with  7501 rows (elements/itemsets/transactions) and  119 columns (items) and a density of 0.03288973  most frequent items: mineral water          eggs     spaghetti  french fries     chocolate       (Other)           1788          1348          1306          1282          1229         22405  element (itemset/transaction) length distribution: sizes    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   18   19   20  1754 1358 1044  816  667  493  391  324  259  139  102   67   40   22   17    4    1    2    1 

K-Means Clustering

Image
# Importing the dataset dataset = read.csv('Mall_Customers.csv') dataset = dataset[4:5] # Splitting the dataset into the Training set and Test set # install.packages('caTools') # library(caTools) # set.seed(123) # split = sample.split(dataset$DependentVariable, SplitRatio = 0.8) # training_set = subset(dataset, split == TRUE) # test_set = subset(dataset, split == FALSE) # Feature Scaling # training_set = scale(training_set) # test_set = scale(test_set) # Using the elbow method to find the optimal number of clusters set.seed(6) wcss = vector() for (i in 1:10) wcss[i] = sum(kmeans(dataset, i)$withinss) plot(1:10,       wcss,       type = 'b',       main = paste('The Elbow Method'),       xlab = 'Number of clusters',       ylab = 'WCSS') # Fitting K-Means to the dataset set.seed(29) kmeans = kmeans(x = dataset, centers = 5) y_kmeans = kmeans$cluster # Visualising the clusters library(cluster) clusplo

Random Forest Classification

Image
# Importing the dataset dataset = read.csv('Social_Network_Ads.csv') dataset = dataset[3:5] # Encoding the target feature as factor dataset$Purchased = factor(dataset$Purchased, levels = c(0, 1)) # Splitting the dataset into the Training set and Test set # install.packages('caTools') library(caTools) set.seed(123) split = sample.split(dataset$Purchased, SplitRatio = 0.75) training_set = subset(dataset, split == TRUE) test_set = subset(dataset, split == FALSE) # Feature Scaling training_set[-3] = scale(training_set[-3]) test_set[-3] = scale(test_set[-3]) # Fitting Random Forest Classification to the Training set # install.packages('randomForest') library(randomForest) set.seed(123) classifier = randomForest(x = training_set[-3],                           y = training_set$Purchased,                           ntree = 500) # Predicting the Test set results y_pred = predict(classifier, newdata = test