Friday, November 8, 2013

Visualizing Fatty Acid Compositions with R

Disclosure: I'm a paid-up convert to the "Paleo" lifestyle. My experience of that is fodder for another post (or maybe a hundred of them) but I pretty quickly dropped 20kg (and what felt like 10 years) to get to a roughly ideal weight of 80kg, and have kept it off for a year.

Anyone who's interested in Paleo should visit the unbelievably useful Mark's Daily Apple.

Anyway - that journey led me to try and learn as much about nutrition as possible, and this is really an area where there's a LOT of information cyber-circulating the world - often conflicting, mashed up with many vested interests, received wisdoms, and lifestyle schools which sometimes feel more like religions.

So I recently (in the last week) decided to learn R. The thing that struck me immediately is that it's ideal for pulling in any level, dimension and volume of data and using it to present a great illustration of almost any point (provided the data actually supports it!)

One of the most interesting things I've learned via Paleo is what a huge effect the types and ratios of different fats in your diet can make to all aspects of your overall health. The principal indicator to watch out for is the ratio of Omega-6 to Omega-3. It should be as close to 1:1 as you can get, but many modern diets push it up as high as 30. More info here, but the short version is that it's A Bad Thing.

So here's a quick R project: I wanted to create a graphic which showed the proportion of different fatty acids in typical oils and fats used in cooking. I also wanted it to emphasize the positive effect of Omega 3, intuitively display the ratio to Omega 6, and give some subordinate information on the other major fat categories. R's flexibility allowed me to tweak the output until the chart not only displayed the data, but also presented the key informational messages.

This really brought home to me that it's useful to have a tool with complete flexibility - it let me fiddle with colours, scales and relative positions so that the chart gave as clear a picture [as I could get to anyhow] of both the actual data and the semantic information that I thought was important to get across. Total time was a few hours, including learning curve - which IMO is thanks to a brilliant open-source platform and a real wealth of specific forum information that's easy to find.

The data, which I scraped from few sources, is here. (LIPIDS.txt) [I think it's close to correct, but no warranties!]


R commented code below:


# make sure libraries are included
require(ggplot2)
require(reshape2)
 
# set the filename and the sep parameters
tfile<-"LIPIDS.txt"
fileSep<-"\t"
 
#load file
LIPIDS<-read.table(tfile,header=TRUE,sep=fileSep,fill=TRUE)
LIPIDS<-data.frame(LIPIDS)
 
#make sure any missing / NA variables set to 0
LIPIDS[is.na(LIPIDS)]<-0
 
#melt the data into a flat structure, and scale it
meltLIPID<-
      melt(
        cbind(
          LIPIDS[,1],sweep(
            LIPIDS[,2:9],2,rowSums(
              LIPIDS[,2:9],na.rm=TRUE
              ),"/")
          *100)
        )
 
#name the columns something senstible
colnames(meltLIPID)<-(c("Fat_Oil","Lipid","Content"))
#replace spaces in the Fat type labels with line breaks
meltLIPID$Fat_Oil=gsub("( +)", "\n ",meltLIPID$Fat_Oil)
 
#create the basic ggplo2
g.lipid <- ggplot(data=meltLIPID[order(meltLIPID$Lipid),],aes(x=Fat_Oil, y=Lipid))
# add some basic theme elements to get the look
g.lipid <- g.lipid  + theme_bw(base_size = 16, base_family = "mono") + 
                      theme(panel.grid.major = element_line(colour = "green4", linetype="longdash", size=0.3))
# plot points showing the %composition of each Fat by Lipid Type, size by %, make part transparent
g.lipid <- g.lipid + geom_point(aes(colour = Lipid, size = Content), alpha=0.6) 
# scale up the size
g.lipid <- g.lipid + scale_size_area(max_size = 37, guide_legend(title="Proportion (%)"))  
# make a manual colour scale to give emphasis to PUFA and 3-6
g.lipid <- g.lipid  + scale_colour_manual(
        values=c(
            "skyblue",
            "skyblue1",
            "skyblue2",
            "skyblue3",
            "skyblue4",
            "slateblue",
            "orangered2",
            "green3"))
 
#modify labels
g.lipid <- g.lipid + theme(text = element_text(size=15),
                           axis.text.x=element_text(angle=-25,face="bold"),
                           axis.text.y=element_text(angle=0,face="bold")
                           )
g.lipid <- g.lipid + xlab("Common Cooking Fats & Oils") + 
                     ylab("Lipid Type\n") + 
                     ggtitle("FAT & OIL BREAKDOWN BY LIPID COMPOSITION\n") + 
                     theme(plot.title = element_text(face="bold"))
#make it a polar plot
g.lipid <- g.lipid + coord_polar()
#adjust the legends
g.lipid <- g.lipid  + guides(colour = guide_legend(title="Lipid Key",override.aes = c(size = 12, shape=19)))
g.lipid <- g.lipid  + guides(size = guide_legend(override.aes = c(colour="green3")))
g.lipid <- g.lipid + geom_hline(aes(yintercept=8, size=0.05, alpha=0.2, color=Lipid))
 
#print the graph
g.lipid
Created by Pretty R at inside-R.org

No comments:

Post a Comment