Posts

Showing posts from January, 2018

How to edit data manually?

say working on mtcars data Code: mtcars< - edit(mtcars) this will open up the table, where we can edit the values manually and save the file.

Exploratory Data Analysis

Image
Exploratory Data Analysis EDA is an attitude to  analysing   data sets  to summarise their main characteristics, often with visual methods . Exploratory data analysis was promoted by  John Tukey  to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. Which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. The purpose of exploratory data analysis is to: Check for missing data and other mistakes. Gain maximum insight into the data set and its underlying structure. Uncover a  parsimonious model , one which explains the data with a minimum number of  predictor variables . Check assumptions associated with any model fitting or  hypothesis test . Create a list of  outliers  or other anomalies. Find  parameter  estimates and their associated  confidence interval

View Data from frames

Say data stored in the name of v1 Want to see first 6  code: head(v1) want to see first 10 code: head(v1,n=10)   or head(v1, 10) Want to see last 6  code: tail(v1) want to see last 10 code: tail(v1,n=10) or tail(v1,10) Want Column Name colnames(v1) Want Row Name rownames(v1)

Read Data

Format Type CSV Format Code :    read.table("c:/mydata.csv", header=TRUE,  sep=",") read.csv(file = file.chose()) After Setting Working Directory read.table("file name") Other Formats SPSS, SAS, Stata & more use "readxl" package

Working Directory

Find out first where your working directory is set at this moment Code:  getwd() Want to change the path  Code:    setwd("<location of your dataset>")

Useful Libraries

"dplyr" To use this  he data must be Tidy The  dplyr  package provides a concise set of operations for managing data frames. With these functions we can do a number of complex operations in just a few lines of code. In particular, we can often conduct the beginnings of an exploratory analysis with the powerful combination of  group_by()  and  summarize() . One important contribution of the  dplyr  package is that it provides a “grammar” (in particular, verbs) for data manipulation and for operating on data frames.  PipeLine Operator - %>% It can Perform Select Filter Sorting Rename Mutate Group_by "ggplot" *Scatter Plot* (geom point) *Histogram* (geom_histogram) *Density* (geom_density) *Boxplots* ( geom_boxplot ) line just remove geom point Code 1 ggplot(data, aes(x=quantity, y=price)) + geom_point() + geom_smooth() 2 ggplot(data, aes(x=quantity, y=price, color = size, size = variable) + geom_point(

Installing Packages

In R Console, install.packages("PackageName") Hit Enter, Packages will Install To Use  library(PackageName) Hit Enter library will load, can be used as per required.