Monday, November 25, 2013

Dynamically creating and manipulating images in ggplot2

This post was inspired by a question on stackoverflow, which wondered whether it was possible to plot a stacked bar chart using rendered images/labels which would overlay their corresponding bar.

The example given was a subset of 'movies' data from IMDB - looking at movies with over 100,000 rating votes and grouping them by year and rating bucket. A basic bar chart representation was shown below:



When I looked at the question (and whether the data could be reshaped into a 'motif' chart), it seems like an interesting exercise to explore a few things:

  • Importing and using custom fonts in R
  • Dynamically generating images from a list of variables (in this case, years)
  • Creating custom grobs which could then be rendered in ggplot
The chart I managed to produce is here, and the commented code below:


require(ggplot2)
require(png)
require(plyr)
require(grid)
require(extrafont)
 
#font_import(pattern="Show") RUN THIS ONCE ONLY
#load the fonts
loadfonts(device="win")
 
#create a subset of data with big votes
big_votes_movies = movies[movies$votes > 100000,]
 
#create a custom palette and append to a table of the unique years (labels) 
years<-data.frame(year=unique(big_votes_movies$year))
palette(rainbow(nrow(years)))
years$col<-palette()
 
#function to create the labels as png files
writeYear<-function(year,col){
 
  png(filename=paste(year,".png",sep=""),width=440,height=190,bg="transparent")
  im<-qplot(1,1,xlab=NULL,ylab=NULL,geom="blank") + 
    geom_text(label=year,size=70, family="Showcard Gothic", color=col,alpha=0.8) +
    theme(axis.text.x = element_blank(),axis.text.y = element_blank()) +
    theme(panel.background = element_rect(fill = "transparent",colour = NA), 
          plot.background = element_rect(fill = "transparent",colour = NA), 
          panel.grid.minor = element_line(colour = "transparent"), 
          panel.grid.major = element_line(colour = "transparent"),
          axis.ticks=element_blank())
  print(im)
  dev.off()
}
 
#call the function to create the placeholder images
apply(years,1,FUN=function(x)writeYear(x["year"],x["col"]))
 
#summarize the data, and create bins manually
summarydata<-big_votes_movies[,c("year","rating","votes")]
summarydata$rating<-cut(summarydata$rating,breaks=c(0,8,8.5,9,Inf),labels=c(0,8,8.5,9))
 
aggdata <- ddply(summarydata, c("year", "rating"), summarise, votes  = sum(votes) )
aggdata<-aggdata[order(aggdata$rating),]
aggdata<-ddply(aggdata,.(rating),transform,ymax=cumsum(votes),ymin=c(0,cumsum(votes))[1:length(votes)])
#identify the image placeholders
aggdata$imgname<-apply(aggdata,1,FUN=function(x)paste(x["year"],".png",sep=""))
ymax<-max(aggdata$ymax)
 
#do the basic plot
z<-qplot(x=10,y=10,geom="blank",xlab="Rating",ylab="Votes \n",main="Big Movie Votes \n") + 
  theme_bw() +
  theme(panel.grid.major = element_line(colour = "transparent"),
        text = element_text(family="Kalinga", size=20,face="bold")        
        ) +
  scale_x_continuous(limits=c(8,9.5)) + 
  scale_y_continuous(limits=c(0,ymax))  
 
#creat a function to create the grobs and return annotation_custom() calls
callgraph<-function(df){
  tiles<-apply(df,1,FUN=function(x)return(annotation_custom(rasterGrob(image=readPNG(x["imgname"]),
                                                      x=0,y=0,height=1,width=1,just=c("left","bottom")),
                                               xmin=as.numeric(x["rating"]),xmax=as.numeric(x["rating"])+0.5,ymin=as.numeric(x["ymin"]),ymax=as.numeric(x["ymax"]))))
  return(tiles)
}
#add the tiles to the base chart
z+callgraph(aggdata)
Created by Pretty R at inside-R.org

No comments:

Post a Comment