This course includes both example code and exercises for you to learn the concepts.
Before starting, please make sure the following libraries are loaded. If you cannot load a
library, you probably don’t have it, and should use the install.packages()
function to load them.
ggplot2 was devloped by Hadley Wickham, a notable R programmer and contributor to Rstudio.
The project stable, and should be available for the long term.
Use http://docs.ggplot2.org/current/ as a reference.
http://www.r-bloggers.com/ is a collection of recent articles on R.
To save a plot, use ggsave
#source("http://bib.umassmed.edu/~wespisea/rCourse/adamGgplot2/answers.R")
library(ggplot2)
library(plyr)
library(grid)
library(reshape2)
library(gridExtra)
library(MASS)
library(HistData)
Load the datasets…
To create a scatter plot using qplot you would use:
qplot(data=iris,x=Sepal.Width,y=Sepal.Length,geom="point")
For ggplot2, the command is:
ggplot(iris,aes(x=Sepal.Width,y=Sepal.Length)) + geom_point()
Use the women dataset to plot average heights and weights for american women.
First, check the contents of the dataset using str(women)
and use help(women)
to learn about the data. Check the answer by cat(answer1)
.
we can map: color, shape to data.frames or arbritrary values
ggplot(iris,aes(x=Sepal.Width,y=Sepal.Length,shape=Species)) + geom_point()
ggplot(iris,aes(x=Sepal.Width,y=Sepal.Length,color=Species)) + geom_point()
Using the birthwt
dataset, what is the distribution of age vs. weight?
By coloring the points, is low birth weight associated with either variable?
Check the answer by cat(answer2)
.
ggplot(iris,aes(x=Sepal.Width,y=Sepal.Length)) + geom_point(color="red",size=4)
ggplot(diamonds,aes(x=carat,y=price))+ geom_point(alpha=I(0.4))
ggplot(diamonds,aes(x=carat,y=price))+ geom_point(alpha=I(0.1))
ggplot(iris,aes(x=Sepal.Width,y=Sepal.Length,color=Species,shape=Species)) +
geom_point(size=4) +
geom_point(size=2,color="grey")
note, the aesthetic mappings work the same
ggplot(economics,aes(x=date,y=uempmed))+geom_line()
plot the number of Typhoid Fever deaths per year using the epi dataset. If you are stuck, take a look at cat(hint3)
and cat(hint3.1)
for the data transforms. Check the answer by cat(answer3)
.
ggplot(economics,aes(x=date,y=uempmed))+ geom_line() +
xlab("Time") +
ylab("Unemployment Rate") +
ggtitle("Unemployment Rate vs Time\n1967-2007")
Add labels and a title to the previous plot. Example solution: cat(answer4)
.
ggplot(mtcars,aes(x=cyl))+geom_bar(binwidth=1)
ggplot(mtcars,aes(x=factor(cyl)))+geom_bar()
ggplot(mtcars,aes(x=factor(cyl),fill=factor(gear)))+geom_bar()
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()
What is the distribution of cases of typhoid per week? What about per year?
See cat(answer5)
and cat(answer5.1)
ggplot(movies, aes(length))+geom_density()
we need to “zoom in” to get a resonable view create distributions of short vs. non-short movies:
ggplot(movies, aes(x=length,fill=factor(Short)))+geom_density() + xlim(0,200)
ggplot(iris, aes(x=Sepal.Width,fill=Species))+geom_density(alpha=I(0.4))
High glycosaminoglycans (GAGs) urine levels can progressively damage tissue,
and high levels are indictitive of inherited disease.Using the GAGurine
dataset
from the MASS package, what is the distribution of GAG assay values?
Check the answer by cat(answer6a)
.
What’s the difference in GAG distribution between children under 2 and
all other children?
see cat(hint6b)
for data transform.
Check the answer by cat(answer6)
.
ggplot(movies[movies$year > 1990,],aes(x=factor(year),y=rating)) +
geom_boxplot()
ggplot(movies[movies$year > 1990,],aes(x=factor(year),y=rating,fill=factor(Short))) +
geom_boxplot()