The Lab Notes

The main theme of our research is to understand how gene regulation and genome organization tie in with each other. The Lab Notes are the latest headlines from the lab, featuring a collection of random thoughts and useful code snippets.

Barcode clustering with Starcode

Our team has recently published an article in Bioinformatics describing Starcode, our software to cluster short sequences.

The first years of the lab have been focused on setting up the TRIP (Thousands of Reporters Integrated in Parallel) technology to study position effects. In a nutshell, we integrate reporters with the Sleeping Beauty transposon in our favourite genome, but before this, we barcode each insertion with a random sequence of 20 nucleotides. The barcode allows us to track RNA expression, DNA repair, protein binding etc. on each inserted reporter.

Our typical experiment is to sequence RT-PCR products that contain all the barcodes expressed by a cell population. The abundance of each barcode tells us how much transcript is produced by each reporter. The only snag is that sequencing is not perfect, so many sequenced barcodes will have mistakes. We need to discover those mistakes and revert them to get an accurate tally of the counts. Since we do this all the time, we decided to make a proper software with a name and all and publish it as such.

Under the assumption that sequencing errors are rare, we expect that barcodes with errors are less frequent than barcodes...