Sherlocc is a PERL written program able to scan Pfam protein families for conserved regions that have a low codon usage frequency (rare codon clusters). This program represents a novel approach as it is efficient enough to perform large-scale analysis of the proteome via the Pfam protein family database (representing about 70% of the known protein universe).
The ressources provided here makes it possible to quickly analyse the whole Pfam dataset (over 11 000 protein families). The result of the analysis of each family can be viewed in an user-friendly HTML output that lets the user visualize the position of rare codon cluters.
You can use the Sherlocc Finder Interface to perform a filter-based search over all the results of the database.
The Pfam families with rare codon clusters after an analysis with a codon usage frequency threshold of 18 were grouped by structural similarity using the CATH database. You can download the file with these families grouped by structural similarity by clicking the link on the right of the page. In this file, each structural category (grouped by the 'topology' CATH hierarchical level) with at least two protein families with rare codon clusters is listed along with the Pfam families accession IDs.
Epub ahead of print
Chartier, M., Gaudreault, F., & Najmanovich, R. (2012). Large scale analysis of conserved rare codon clusters suggests an involvement in co-translational molecular recognition events. Bioinformatics (Oxford, England). doi:10.1093/bioinformatics/bts149