BFCounter: Memory efficient k-mer counting software
BFCounter is a program for counting k-mers in DNA sequence data. Counting k-mers (substrings of length k) is an essential compononet of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle, counting k-mers in large modern sequence data sets can easily overwhelm the memory capacity of standard computers. In current data sets, a large fraction - often more than 50% - of the storage capacity may be spent on storing k-mers that contain sequencing errors and which are typically observed only a single time in the data. These singleton k-mers are uninformative for many algorithms without some kind of error correction.
BFCounter identifies all the k-mers that occur more than once in a DNA sequence data set. Our method does this using a Bloom filter, a probabilistic data structure that stores all the observed k-mers implicitly in memory with greatly reduced memory requirements.
Publication: Melsted, P. and Pritchard, J.K.: Efficient counting of k-mers in DNA sequences using a bloom filter.BMC Bioinformatics 2011 12:333.
Download BFCounter 0.2
Previous versions: BFCounter 0.1
For questions or comments write to pmelsted at gmail.com