Useful Libraries

"dplyr"

To use this he data must be Tidy

The dplyr package provides a concise set of operations for managing data frames. With these functions we can do a number of complex operations in just a few lines of code. In particular, we can often conduct the beginnings of an exploratory analysis with the powerful combination of group_by() and summarize().

One important contribution of the dplyr package is that it provides a “grammar” (in particular, verbs) for data manipulation and for operating on data frames. 

PipeLine Operator - %>%

It can Perform
  1. Select
  2. Filter
  3. Sorting
  4. Rename
  5. Mutate
  6. Group_by


"ggplot"


*Scatter Plot* (geom point)
*Histogram* (geom_histogram)
*Density* (geom_density)

*Boxplots* (geom_boxplot)


line just remove geom point

Code

1

ggplot(data, aes(x=quantity, y=price)) + geom_point() + geom_smooth()

2
ggplot(data, aes(x=quantity, y=price, color = size, size = variable) + geom_point() + geom_smooth()

#geom_smooth(se=FALSE) - LINE WITHOUT ANY GREY COLOR
#geom_smooth()  LINE WITH ANY GREY COLOR
#geom_smooth(se=FALSE, method='lm')

Faceting
*graph breaks into 10 different clarity values( say 10 values)
ggplot(data, aes(x=quantity, y=price, color = size, size = variable)) + geom_point() + facet_wrap(~ clarity)

ggplot(data, aes(x=quantity, y=price, color = size, size = variable)) + geom_point() + facet_grid(color~ clarity)

ADDING TITLE TO CHART

ggplot(data, aes(x=quantity, y=price, color = size, size = variable)) + geom_point() + ggtitle("my visual scatter plot") + xlab("quant")+xlim(0,2)


#ggtitle("my visual scatter plot") - to give chart name
# xlab("quant") - giving Desired X Axis Name
#xlim(0,2) -  giving Desired XAxis range between 0,2

Histograms and Density Plots

1
ggplot(data, aes(x=quantity) )+ geom_histogram(binwidth=200)

* 2 gives color stack histogram 

ggplot(data, aes(x=quantity, fill =  clarity)) + geom_histogram


#We can apply facet option as above
#scale ="free_y"  by facet the multiple histograms adjusts the scales *3

3
ggplot(data, aes(x=quantity)) + geom_histogram(binwidth=200) + facet_wrap(~ clarity, scale ="free_y")


Density Plots

/Output Graph Black in color/

ggplot(data, aes(x=quantity) )+ geom_density()

/Output Graph will be in color/ ( Helpful in comparing multiple distributions)

ggplot(data, aes(x=quantity, color = size) )+ geom_density()


Boxplots and Violin Plots

Boxplots( if data is normally distributed)
ggplot(data, aes(x=quantity, y=price)) + geom_boxplot() 

if more outliers then go log 

ggplot(data, aes(x=quantity, y=price)) + geom_boxplot()  + scale_y_log10()


Violin Plots (if data is not normally distributed)

ggplot(data, aes(x=quantity, y=price)) + geom_violin()  + scale_y_log10()

Saving the plots

store the the data in the object

d = ggplot(data, aes(x=quantity, y=price)) + geom_boxplot() 

now stored in the object called d.


image format

ggsave(filename = "box1=jpeg", d)

ggsave(filename = "box1=png", d)

PDF Format

ggsave(filename = "box1=pdf", d)


or by graphical method

from output window

export 
save as image or pdf (can select aspect rations for the image).







Comments

Popular posts from this blog

Decision Tree Classification

Apriori

Random Forest Classification