There are articles everywhere talking about advances in genetic research, finding that single gene that is the culprit for obesity, for diabetes, for cancer, and so on. In a sense, these articles take extremely complex research publications and break them down into a mainstream, digestible format for general consumption. Even still, it really isn't as simple as it sounds.
I work in a Next-Generation Sequencing Core Facility, and our workflow generally follows these steps:
- Researchers contact us with an experiment, and we work out the details of their design and system requirements.
- Researchers send us extracted and purified DNA or RNA.
- We take these samples and perform quality control on them. The quality-control step allows us to quantify the sample and check for degradation.
- After QC, we create sample libraries, which are essentially prepared samples that are ready to go on a sequencing machine.
- Once the libraries are validated, we place them onto our sequencing machines, either the illumina HiSeq2000, Roche FLX454, or LifeTech Ion Torrent PGM.
- After the machines do their thing, a massive amount of data is pushed down to our main computers. This is where I take over from our lab technicians.
- The data are initially processed and transformed from company-specific formats to industry-standard formats. Generally this is into FASTQ format.
- Once data are in this format, they are placed onto a download server and made available to researchers.
So now that you have a brief overview of the workflow, it is probably a pretty vague concept still. Most of you probably understand what DNA and RNA is: our fundamental genetic code that makes us who we are. . . .
So what does sequencing do? Essentially, we take a person's genetic material and determine its actual code. We figure out the A-C-G-T order that makes us who we are. We generate billions and billions of these letters, in a seemingly random order. The trick to all of this is figuring out what it means. In order to do this, I can take the ACGTACGT information and "map" it back to a reference file. Think of it like having the last name "Bard." You open up your phone's contact list and scan first for the letter B and then A . . . eventually you will find a matching location. However, there are small problems. In my family there are many Bards. My parents, my siblings, my relatives, even completely unrelated people. By sequencing more information, we are able to increase the uniqueness of that information. Currently we are able to sequence fifty to one hundred letters on one machine, and several hundred on another. This gives us a greater ability to look up where the sequence comes from. "BardJonathanE" would produce many fewer results than simply "Bard."