Non-model organism – annotations in Bioconductor
It took me couple of days of searching the web and trying to build my own annotation database for non-model organism, before I’ve realized that Bioconductor already has a mechanism that automates the whole process.
In this particular example, I needed to build an annotation database for tomato, Solanum lycopersicum. Here are the steps needed to build it automatically:
1 2 3 4 5 6 7 8 |
source("http://bioconductor.org/biocLite.R") library(AnnotationForge) makeOrgPackageFromNCBI(version="0.0.1",author="PS",maintainer="PS <ps@pawelszczesny.org>",outputDir=".",tax_id=4081,genus="Solanum", species="lycopersicum") #here you need to wait a bit, few hours maybe install.packages("org.Slycopersicum.eg.db", repos=NULL, type="source") library("org.Slycopersicum.eg.db/") keys<-head(keys(org.Slycopersicum.eg.db)) head(select(org.Slycopersicum.eg.db, keys=keys,cols=c("ENTREZID","ACCNUM","ALIAS","CHR","PMID","REFSEQ","SYMBOL","UNIGENE" ,"GENENAME")) |
Now, you can make downstream functional analysis of microarrays or something. Source code available as gist.