Useful Libraries

"dplyr"

To use this he data must be Tidy

The dplyr package provides a concise set of operations for managing data frames. With these functions we can do a number of complex operations in just a few lines of code. In particular, we can often conduct the beginnings of an exploratory analysis with the powerful combination of group_by() and summarize().

One important contribution of the dplyr package is that it provides a “grammar” (in particular, verbs) for data manipulation and for operating on data frames. 

PipeLine Operator - %>%

It can Perform
  1. Select
  2. Filter
  3. Sorting
  4. Rename
  5. Mutate
  6. Group_by


"ggplot"


*Scatter Plot* (geom point)
*Histogram* (geom_histogram)
*Density* (geom_density)

*Boxplots* (geom_boxplot)


line just remove geom point

Code

1

ggplot(data, aes(x=quantity, y=price)) + geom_point() + geom_smooth()

2
ggplot(data, aes(x=quantity, y=price, color = size, size = variable) + geom_point() + geom_smooth()

#geom_smooth(se=FALSE) - LINE WITHOUT ANY GREY COLOR
#geom_smooth()  LINE WITH ANY GREY COLOR
#geom_smooth(se=FALSE, method='lm')

Faceting
*graph breaks into 10 different clarity values( say 10 values)
ggplot(data, aes(x=quantity, y=price, color = size, size = variable)) + geom_point() + facet_wrap(~ clarity)

ggplot(data, aes(x=quantity, y=price, color = size, size = variable)) + geom_point() + facet_grid(color~ clarity)

ADDING TITLE TO CHART

ggplot(data, aes(x=quantity, y=price, color = size, size = variable)) + geom_point() + ggtitle("my visual scatter plot") + xlab("quant")+xlim(0,2)


#ggtitle("my visual scatter plot") - to give chart name
# xlab("quant") - giving Desired X Axis Name
#xlim(0,2) -  giving Desired XAxis range between 0,2

Histograms and Density Plots

1
ggplot(data, aes(x=quantity) )+ geom_histogram(binwidth=200)

* 2 gives color stack histogram 

ggplot(data, aes(x=quantity, fill =  clarity)) + geom_histogram


#We can apply facet option as above
#scale ="free_y"  by facet the multiple histograms adjusts the scales *3

3
ggplot(data, aes(x=quantity)) + geom_histogram(binwidth=200) + facet_wrap(~ clarity, scale ="free_y")


Density Plots

/Output Graph Black in color/

ggplot(data, aes(x=quantity) )+ geom_density()

/Output Graph will be in color/ ( Helpful in comparing multiple distributions)

ggplot(data, aes(x=quantity, color = size) )+ geom_density()


Boxplots and Violin Plots

Boxplots( if data is normally distributed)
ggplot(data, aes(x=quantity, y=price)) + geom_boxplot() 

if more outliers then go log 

ggplot(data, aes(x=quantity, y=price)) + geom_boxplot()  + scale_y_log10()


Violin Plots (if data is not normally distributed)

ggplot(data, aes(x=quantity, y=price)) + geom_violin()  + scale_y_log10()

Saving the plots

store the the data in the object

d = ggplot(data, aes(x=quantity, y=price)) + geom_boxplot() 

now stored in the object called d.


image format

ggsave(filename = "box1=jpeg", d)

ggsave(filename = "box1=png", d)

PDF Format

ggsave(filename = "box1=pdf", d)


or by graphical method

from output window

export 
save as image or pdf (can select aspect rations for the image).







Comments

Popular posts from this blog

Apriori

Decision Tree Classification