Every once in a while I get a request from various researchers to host data for visualization in the UCSC Genome Browser. After having generated hundreds of links, I've decide to move toward their trackhub system. The downside to this, is the need for a trackDB.txt file, which contains all of the meta information.

First, for all you non Bioinformatics / Genetics / Science people, a little context. The UCSC Genome Browser is a way to visualize various datasets we produce in the lab with surrounding context. What does a gene look like to us?  A bar...

Screen Shot 2015-06-02 at 3.43.23 PM

The exons are the boxey-things, while the introns are the space between the boxey-things. This helps us understand the structure of genes, and where they are positioned relative to our reference genome (the coordinates at the top).

By giving us this view, we can then load in other data, and visualize how it looks. For instance, for one of our experiments we see a really strong signal at the start of the SETD2 gene:

Screen Shot 2015-06-02 at 3.45.20 PM

In order to generate these visualizations, we need to let UCSC know where our data is. we do this by hosting the information in specific formats (like bigWig).  We also need to set up the right configurations for the "trackHub" to run. Each trackHub requires a group, and a type "like bigWig, bed , bam etc". The program needs to take that information, and use it.

'g|group:s' =>$group,
't|type:s' =>$type),
or die("Error in command line arguments");

Then just dump out the screen each of the required fields

foreach (@extraArgs) {
    print "track " . $_ . "n"; 
    print "parent " . $group . "n";
    print "visibility densen";
    print "bigDataUrl " . $_ . "n";
    print "shortLabel " . $_ ." $type n";
    print "longLabel " . $group . " " . $_ . " " . $type . "n";
    print "type " . $type . "n";

The result is a lot easier to manage, resulting in one link instead of the dozens normally required.