The Lab Notes
The main theme of our research is to understand how gene regulation and genome organization tie in with each other. The Lab Notes are the latest headlines from the lab, featuring a collection of random thoughts and useful code snippets.
Bioinformaticians think about algorithms, they discuss analyses and normalization, but at the end of the day, they spend the overwhelming majority of their time preparing the data. And yet, this essential step of any analysis has so far received little attention from the community. We recently tackled this question for ChIP-seq data and came up with a couple of ideas that we put in a discretizer that we called Zerone.
Zerone is born from some old Hidden Markov routines from my post-doc, and from my frustration of working with ENCODE ChIP-seq data. A few years back, we discovered serious issues with about 20% of the ChIP-seq ENCODE profiles in K562, ranging from lack of reproducibility between replicates to total absence of signal. But what could we do? Just throw those profiles away because we did not like them? Surely there must be a more scientific way to approach this question.
We set out to identify a signature for low quality data by a machine learning approach. This seemed difficult because there are so many ways for things to go wrong. Likewise, good ChIP-seq data is also very variable, so what you have to...