The Lab Notes

The main theme of our research is to understand how gene regulation and genome organization tie in with each other. The Lab Notes are the latest headlines from the lab, featuring a collection of random thoughts and useful code snippets.

How to convert PubMed references to BibTeX

Time and again, you need to write a paper in LaTeX with at lot of citations from PubMed. And we all know that PubMed does not support the BibTex format. Fortunately, you do not have to go through the pain of a manual conversion. If you are familiar with basic scripting, this can be done fairly easily with the following steps.

Write your TeX document with pubmed citation numbers. Each article on pubmed has a pmid which consists of numbers only, which is the key of the PubMed record of this article. For instance, when you cite the PubMed article 22999052 in your TeX document you would write

\citep{pmid22999052}

Extract the pmids from the tex document. For instance, if your TeX document is called document.tex, at the Linux command line you can do this with

grep -o "pmid[0-9]*" document.tex | sort -u | sed 's/pmid//'

Use the eFetch API to get the PubMed records in XML format. Assuming that you now have a comma-separated list of pmids somewhere ready (in the example below, the pmids are 22999052,21813512), paste the following in your browser text box (or open this link in a new...






How to gunzip on the fly with Python

For a long time I wondered how R was able to recognize gzipped files and decompress them on the fly. This is neat because the large data files that we manipulate in bio-informatics are better kept compressed on the disk and decompressed upon loading them in memory.

Most binary file formats start with a magic number, indicating which file type it is. A properly gzipped file starts with 1F8B. You need to read the first two bytes, and once you figure out whether the file is compressed, you either read the file as usual, or read it with the functions of the gzip package.

Here I wrote a small module called gzopen.py. After importing the class gzopen, you can use it to seamlessly open gzipped files.

# -*- coding:utf-8 -*-

import gzip

class gzopen(object):
"""Generic opener that decompresses gzipped files
if needed. Encapsulates an open file or a GzipFile.
Use the same way you would use 'open()'.
"""
def __init__(self, fname):
f = open(fname)
# Read magic number (the first 2 bytes) and rewind.
magic_number = f.read(2)
f.seek(0)
# Encapsulated 'self.f' is a file or a GzipFile.
if magic_number == '\x1f\x8b':
self.f = gzip...