IsoMIF Finder Get Started Guide
The IsoMIF Finder Interface allows you to identify binding site molecular interaction field (MIF) similarities between a query protein structure and a database of pre-calculated MIFs or user defined cavity MIFs. For detailed information on IsoMIF please see the paper below:
Chartier M. and Najmanovich R. (2015) Detection of Binding Site Molecular Interaction Field Similarities. Journal of Chemical Information and Modeling. PubMed | JCIM | PDF
Chartier M., Étienne Adriansen and Najmanovich R. (2015) IsoMIF Finder: online detection of binding site molecular interaction field similarities. Bioinformatics, Oxford Journals. PubMed | Bioinformatics | PDF
See results example page.
Submitting a job
1. Choosing a query PDBThe user must either enter a 4 letter PDB code (PDB will be fetched from http://rcsb.org) or upload a PDB file.
2. Define cavity on query PDB
Query molecular interaction fields (MIFs) are calculated in cavities of the query PDB. The cavities are identified with GetCleft (our in-house implementation of Surfnet) in a purely geometric manner. The cavities can represent potential binding sites of small molecules. The user can either let GetCleft find the top N largest cavities (up to 5) or can specify a crystalized ligand in the PDB file.
Find top N cavities
The image below shows the top 5 cavities of PDB 6COX. The cavity in red one is the largest and encompasses a large volume. For this reason, once the cavities are identified for the query protein, the user can crop the desired cavities keeping only the volumes of interest for each cavity.
Cavity of a bound ligand
The cavity in contact with the specified ligand can also be identified and the cavity volume surrounding it cropped using a distance threshold in Å specified by the user. For this, some information found in the PDB file (without spaces) must be given in addition to the distance:
The columns represent the columns in the PDB file. For instance, in the PDB file 6COX.pdb there is a p-bromo derivative of celecoxib bound. Inspecting the PDB structure with PyMOL one can see its residue name is S58, its number 701 and chain B. This is also seen in the PDB file itself:
- residue name - 3 characters - columns [18-20]
- residue number - 4 digits - columns [23-26]
- chain - 1 character - column 
- alternate location (optional) - 1 character - column 
- distance around the ligand
0 1 2 3 4 5 6 7 8
Here are the information that must be entereThe p-bromo derivative of celecoxib is represented by:
HETATM 9145 C1 S58 B 701 67.218 14.089 40.653 1.00 23.73 C
This information entered in the form as in the figure below would yield a MIF calculated in the pocket where the celecoxib binds and up to 3Å around the molecule as depicted in the following figure.
3. Compare query MIF toThe next step is to determine against what the MIF(s) calculated in the cavities of the query PDB will be compared to. There are two options. Against a set of pre-calculated MIFs or to a set of MIFs calculated in cavities defined by the user.
Compare to pre-calculated MIFs
- Human Purinome (2643 entries) - A dataset of human proteins bound to purine containing ligands. The MIFs are calculated around the purine containing ligands.
- scPDB (8077 entries) - The druggable binding sites of the 2014 release of scPDB. MIFs are calculated around the bound molecules.
- Pisces (14459 entries) - A non-redundant subset of the PDB (see Pisces). The MIFs are calculated in the top 2 cavities of each PDB-Chain entry of the dataset at 2.0Å resolution and 30% identity redundancy threshold. Cavities with more than 250 residues in contact are discarded.
- Drug-Target complexes (412 entries) - Protein-ligand complexes of drugs bound to their primary target (or homologue of the primary target). The CSV file was taken from rcsb.org at http://www.rcsb.org/pdb/ligand/drugMapping.do. The final 412 set excludes DNA or RNA structures and is a non-redundant ensemble of all ligand name, number and chain combinations.
Compare to cavities from user PDB list
Query MIFs can also be compared to MIFs calculated in cavities of PDB structures defined by the user. For this, the user must enter one PDB entry per line. Each can be in one of the 3 line formats described below. The first 100 cavities found traversing this list are retained.
- 1LHU - Finds the largest cavity of PDB 1LHU.
- 1LHU 3 - Finds the top 3 largest cavities of 1LHU.
- 1LHU EST 301 A - Find the cavity in contact with ligand EST 301 A. Same ligand description as above, residue name, number, chain and alt location (optional). The cavity is cropped at 3Å around the ligand.
Here the user can define some parameters and the best ones are chosen by default as in the figure below.
- Grid spacing: defines the length of the edges of the grid built in the volume of the cavity where at each vertex all probes will evaluate a potential interaction. Smaller grid length means a sampling of potential interactions at a higher resolution. Benchmark for molecular function prediction showed best results using 1.5Å.
- Geometric distance: defines a threshold length difference tolerance between corresponding vertex pairs. Higher thresholds allow probes identified as corresponding between two MIFs to be in more variable relative positions. This threshold allows to account for conformational variability.
- Number of top results: in the results page, tables show the top hits for each query cavity. The number of top hits to display can be defined here. The list of all entries sorted by similarity can be downloaded in CSV format.
- PyMOL sessions to generate: PyMOL sessions that show the two proteins superimposed using the similarities found with the similarities are generated for the top N hits.
5. Email notification
The user can specify an email address at which notifications will be sent when the job is submitted (with a URL to the job page), when the cavities are identified and when the job is done.
Cropping and selecting cavities
Query cavitiesOnce the cavities of the query protein have been identified (an email is sent), their volume where the MIFs will be calculated can be cropped. This constrains the MIFs to be calculated in a more specific region of the binding site.
The query cavities are shown with a JSMOL window. The volume of the cavity is shown by light red spheres and cavity lining residues are shown in sticks and listed on the right. All residues can be checked with a check-all button, the residue names can be shown on the structure and the cavity spheres can be hidden. The user can zoom, rotate and translate in the X-Y plan by holding shift then double clicking while holding the second click then moving the molecule.
Unchecking residues will remove spheres with a surface point within 3Å of any atom of this residue. In the image above residues GLN192B, ARG513B and ALA516B are unchecked and shown in semi-transparent sticks in the JSMOL window. Cavities can also be unchecked (the checkbox next to "6COX - Query Cavity 1" in the image above) if the user dosen't want to compare this cavity to the comparison set.
User cavities (comparison set)If the user provided PDB entries for the comparison set, the cavities found for each are listed in a table with information about the cavity (described below).
Each cavity can be unchecked so that it doesn' get compared to the query MIFs.
- PDB: the pdb code
- Cavity rank: higher rank means larger cavities
- HETATMs: the ligand information if the user specified a ligand around which to define the cavity
- Chains in contact: the PDB chains in contact with the cavity
- HETATMs with the cavity: the PDB name, number and chain of ligands in contact or within the cavity
- Nb. residues in contact: number of residues in contact with the cavity
- Nb. spheres: the number of spheres that define the volume of the cavity (relative measure of volume).
- GIF and PNG: snapshots of the cavity to see where the cavity is located in the protein structure.
Clicking the GIF or PNG of a cavity will show a snapshot of the protein with the cavity volume in mesh to help see where the cavity is relative to the whole structure.
Job info and log
The job page always shows a header table with info about the job. The user can delete his job at any time by clicking the "Delete permanently from server" button.
Below this table, a log shows job progress and is updated every ~2minutes. A refresh button allows to refresh the page.
Analyzing the results
Once each query cavity MIFs are compared to all MIFs of the comparison set, an email is sent to the user with a link to the results page. In this page, each query cavity is shown in a separate table with top similar MIFs sorted by similarity. Results table and PyMOL files can be downloaded for offline analysis.
Online results table
Once the MIF similarities are identified, the top hits are shown in a table, one for each query cavity.
For each query cavity:
For each top hit (every row) there are information about the cavity and protein.
- The MIF can be viewd with a PyMOL session a GIF or PNG snapshop. In any case, the 6 probe types are color-coded and in the PyMOL session a separate object shows all the vertices where MIF is sampled.
- A CSV file sorted by tanimoto similarity or by number of nodes can be downloaded.
- The table can display top hits based on a sorting by tanimoto or by nodes. This can yield different top hits.
- The table can also display entries from different Pfam families.
In the example below, a query MIF calculated in the cavity where the p-bromo derivative of celecoxib binds to COX-2 (PDB 6COX). Hovering the PNG or GIF of a top shit shows a snapshot of the similarities represented by color-coded probes. The figure below shows a snapshot of the similarities found between the query MIF and that of a Carbonic anhydrase, the top hit ranked #5.
- Entry name with link to PDB website
- Tanimoto similarity with Z-score
- Number of nodes (absolute similarity) with Z-score
- Size of search space for query and target
- name, number, chain and alt. location of crystalized ligand (except for pisces dataset)
- Pfam family and Uniprot of the top hit (based on the PDB-chains in contact with the hit's MIF)
- Protein name
- PyMOL session for full visualization of the similarities
- GIF and PNG snapshot of the similarities
If no PyMOL sessions was generated for a given top hit, click on the green plus sign. This will launch a job to generate the PyMOL session. If an email address was provided for the job, an email is sent to indicate to job is done. Upon refreshing the page, the PyMOL session will be available for download.
Downloadable PyMOL sessions
For further analysis, a PyMOL session can be downloaded. A PyMOL session ('pml' link) can be downloaded and used to further analyse the similarities probe by probe and inspect underlying residues in both proteins. In this PyMOL session, the two proteins and bound molecules are superimposed using a rotation matrix that best superimposes the matched probes. The probes are color-coded by intermolecular interaction type as follow:
Below is the PyMOL session of cyclooxygenase-2 query structure (6COX) with the similarities found to the carbonic anhydrase (1RJ6).
The PyMOL session contains different objects:
The spheres are in two size: bigger spheres represent the query protein and smaller spheres represent the top hit protein. In a given vertex of a MIF, it is possible that more than one probe type is found similar to another in the second MIF. Therefore similarties can overlap and it is best to visualize each probe type separately as in the figure below to have a clearer picture of the similarities distribution.
- qry6COX_1 and 1RJ6_AZM400A- are the query 6COX and top hit 1RJ6 PDB structures
- The objects that start with hyd_, arm_, don_, acc_, pos_ or neg_ represent probes that were found similar for both cavities.
- Objects that start with a mif_ are semi-transparent spheres that show the initial search space in both cavities. The mif_ spheres that remain visible when the probe objects are visible thus represent MIF points that were not found similar between the two cavities.
This is done in PyMOL by selecting only the objects related to a given probe, for example the hydrophobic probes found similar between the two MIFs are hyd_qry6COX_1 and hyd_1RJ6_AZM400A-.
The top hits found for the query 6COX represent potential off-targets of celecoxib. In principle, it also suggests that the molecules bound to the top hits could bind the query cavity. This remains to be verified experimentally with binding and activity assays. Celecoxib has been shown to bind to carbonic anhydrases I, II, IV, and IX (Weber et al. 2004). PDB 1RJ6 of carbonic anhydrase XIV was found at rank 5.
The sulfonamide group is thought to be important for binding to carbonic anhydrases. Inspecting the residues surrounding the similarities near the sulfonamide can help to identify favorable interactions within the off-target binding site.
The figure below shows the p-bromo derivative of celecoxib and the surrouding residues in 6COX and top hit 1RJ6. The sulfonamide group interaction would be mediated by a Zn2+ atom with 3 histidines which would correspond to histidine 90 in 6COX. These create a region favorable for negatively charged probes (magenta spheres) which in this case would be the lone pairs of the oxygen and of the nitrogen of the sulfonamide. The other oxygen of the sulfonamide can act as an acceptor for the hydroxyl of THR199. In 6COX the corresponding hydrogen bond donor would be GLN192.
Hydrophobic residues in corresponding positions in 6COX and 1RJ6 cause a large hydrophobic similarity patch as can be seen in the figure below. The superimposition of the structures using the similarities gave many hydrophobic amino acid pairs of corresponding residues (LEU, VAL and ALA) in corresponding positions.
When given to glaucomatous rabbits, celecoxib decreased intra-ocular pressure (Weber et al. 2004). This example demonstrates how IsoMIF Finder can be used to identify binding site similarities and interpret the results in the context of a drug repurposing context.