Intro

This course includes both example code and exercises for you to learn the concepts.
Before starting, please make sure the following libraries are loaded. If you cannot load a
library, you probably don’t have it, and should use the install.packages() function to load them.

ggplot2 basics

ggplot2 was devloped by Hadley Wickham, a notable R programmer and contributor to Rstudio.
The project stable, and should be available for the long term.
Use http://docs.ggplot2.org/current/ as a reference.
http://www.r-bloggers.com/ is a collection of recent articles on R.
To save a plot, use ggsave

#source("http://bib.umassmed.edu/~wespisea/rCourse/adamGgplot2/answers.R")
library(ggplot2)
library(plyr)
library(grid)
library(reshape2)
library(gridExtra)
library(MASS)
library(HistData)

Load the datasets…

Scatter Plots

To create a scatter plot using qplot you would use:

qplot(data=iris,x=Sepal.Width,y=Sepal.Length,geom="point")

For ggplot2, the command is:

ggplot(iris,aes(x=Sepal.Width,y=Sepal.Length)) + geom_point() 

Question 1

Use the women dataset to plot average heights and weights for american women.
First, check the contents of the dataset using str(women) and use help(women) to learn about the data. Check the answer by cat(answer1).

Aesthetic Mappings

we can map: color, shape to data.frames or arbritrary values

ggplot(iris,aes(x=Sepal.Width,y=Sepal.Length,shape=Species)) + geom_point()

ggplot(iris,aes(x=Sepal.Width,y=Sepal.Length,color=Species)) + geom_point()

Question 2

Using the birthwt dataset, what is the distribution of age vs. weight?
By coloring the points, is low birth weight associated with either variable?
Check the answer by cat(answer2).

Set aesthetics to fixed values:

ggplot(iris,aes(x=Sepal.Width,y=Sepal.Length)) + geom_point(color="red",size=4)  

ggplot(diamonds,aes(x=carat,y=price))+ geom_point(alpha=I(0.4))

ggplot(diamonds,aes(x=carat,y=price))+ geom_point(alpha=I(0.1))

Layered geoms

ggplot(iris,aes(x=Sepal.Width,y=Sepal.Length,color=Species,shape=Species)) + 
  geom_point(size=4) + 
  geom_point(size=2,color="grey")

Line Plot

note, the aesthetic mappings work the same

ggplot(economics,aes(x=date,y=uempmed))+geom_line()

Question 3

plot the number of Typhoid Fever deaths per year using the epi dataset. If you are stuck, take a look at cat(hint3) and cat(hint3.1) for the data transforms. Check the answer by cat(answer3).

Add labels and a title:

ggplot(economics,aes(x=date,y=uempmed))+ geom_line() +
  xlab("Time") + 
  ylab("Unemployment Rate") + 
  ggtitle("Unemployment Rate vs Time\n1967-2007")

Question 4

Add labels and a title to the previous plot. Example solution: cat(answer4).

Bar Plot

ggplot(mtcars,aes(x=cyl))+geom_bar(binwidth=1) 

use factor instead of numerical datatypes

ggplot(mtcars,aes(x=factor(cyl)))+geom_bar()

color the bars by type

ggplot(mtcars,aes(x=factor(cyl),fill=factor(gear)))+geom_bar()

another, better example:

ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()

Question 5

What is the distribution of cases of typhoid per week? What about per year?
See cat(answer5) and cat(answer5.1)

Density Plot

ggplot(movies, aes(length))+geom_density()

Adjust x and y limits

we need to “zoom in” to get a resonable view create distributions of short vs. non-short movies:

ggplot(movies, aes(x=length,fill=factor(Short)))+geom_density() + xlim(0,200)

Set transperancy of distro

ggplot(iris, aes(x=Sepal.Width,fill=Species))+geom_density(alpha=I(0.4))

Question 6a

High glycosaminoglycans (GAGs) urine levels can progressively damage tissue,
and high levels are indictitive of inherited disease.Using the GAGurine dataset
from the MASS package, what is the distribution of GAG assay values?
Check the answer by cat(answer6a).

Question 6b

What’s the difference in GAG distribution between children under 2 and
all other children?
see cat(hint6b) for data transform.
Check the answer by cat(answer6).

Boxplot

ggplot(movies[movies$year > 1990,],aes(x=factor(year),y=rating)) + 
  geom_boxplot()

ggplot(movies[movies$year > 1990,],aes(x=factor(year),y=rating,fill=factor(Short))) + 
  geom_boxplot()