MED-SuMo software

The Protein Data Bank PDB is a unique public source of macromolecular structures with co-crystallized ligand(s) in many cases. MED-SuMo software makes possible to compare and superpose any 3D interaction molecular surface accross the PDB. It opens new approaches in structure based drug design to molecular modelers, crystallographers and medicinal chemists. MED-SuMo retrieves onto your binding binding site of interest or a defined protein full surface, all PDB structures that are sharing in 3D a network of intermolecular interactions such as charges, H-bonds, hydrophobic and aromatic stacking.

MED-SuMo PDF brochure


Surface chemical features: in MED-SuMo, the notion of Chemical Features is fundamental. Every macromolecular structure is first converted into a set of chemical features. Only Features which are available for interacting with ligands are selected and are named Surface Chemical Features (SCF). Several types of chemical features are defined by default but can be easily modified. Each of them will represent a given property. Only chemical features of the same type can be compared and possibly considered as equivalent. The default dictionary of SCFs such as H-Bonds, formal charges, hydrophobic and aromatic groups is shown below in Fig.1.

Fig.1: MED-SuMo comparison procedure. (1) Graph construction. (a) Surface Chemical Features (SCFs) are displayed on the protein structure through a lexicographic analysis of the PDB files. (b) Their positions and orientations are checked to discard SCF potentially involved in internal interactions or associated to buried atoms. (c) SCFs are gathered in triangles. (d) The triangle network is then stored as a graph data structure with the triangles as vertices and with edge connecting adjacent triangles. (2) Graph Comparison. (e) The query graph (in green color) is compared to the database graphs (in pink color), compatible triangles are selected, i.e., they are formed by compatible SCFs. (f) Multiple corresponding graphs are found.

Two kinds of MED-SuMo databases: sites database and full surface databases: in Drug Design applications, it’s the best technology to take advantage of the ongoing exponential growth of the public Protein Data Bank[3] where all 3D experimental protein-ligands are stored. MED-SuMo is a fast and reliable technology to query and mine the biggest available macromolecules 3D structural database.

The reference database, the Protein Data Bank is freely and publicly available and contains (update april 2008) the 3D atomic coordinates of:
– 136,000 ligands bound to macromolecules (8,000 are distinct)
– 50,000 macromolecules (47,000 proteins)

A MED-SuMo site database contains the graphs of SCFs triplets which are in the environnement of a ligand. This environnement is defined by a maximum distance between atoms of the ligand and chemical features of 4.5 Å or 6.0 Å in most cases (user defined). 6.0Å corresponds to a broader binding site definition around a ligand and is a better choice for site detection and functionnal annotation. 4.5 Å is more suited to drug design applications. A ligand is defined as a set of heteroatoms or small peptides. A full surface database corresponds to the whole surface, e.g. all features but burried or involved in intraprotein h-bonds. The corresponding MED-SuMo databases encodes the whole PDB (as april 2008):
– Site database containing 136,000 ligand sites description
– Full surface database which contains 50,000 full surface description (protein, RNA and DNA can be described)

4 comparison modes can be exploited:

MED-SuMo server parameters: the triangle network of chemical features can be tuned to be more or less dense. The maximum length of an edge and the maximum sum of the three edges are tunable parameters when the database is generated:
– High density triangle network (BEST): parameters 20-60
– Default density triangle network (FAST): parameters 13-39

When surface chemical matches, they are eventually tested for having a similar shape environment. This shape threshold is a tunable parameter when the comparison is ran:
– Default shape threshold: ST=65%
– Lower shape threshold to allow more tolerance in the shape comparison: ST=45%

[1] Jambon M, Imberty A, Deléage G, Geourjon C “A new bioinformatic approach to detect common 3D sites in protein structures” Proteins: Struct., Funct., and Gen. 52:137-145 (2003)
[2] Jambon M, Andrieu O, Combet C, Deléage G, Delfaud F, Geourjon C “The SuMo server : 3D search for protein functional sites” Bioinformatics Vol 21, n°20, 3929-3930 (2005)
[3] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: “The Protein Data Bank” (2000) Nucleic Acids Research, 28 pp. 235-242.

Working with MED-SuMo:

Input: MED-SuMo uses as input a macromolecular structure database with up to hundreds of thousands structures:
– the PDB (default)
– in house curated database
– models

– Ligand are aligned, shown in the graphical user interface and output as they are defined in the original PDB file.
– In house curation of PDB ligands can be applied on the output of MED-SuMo.
– MEDIT is providing by default an optional bond order and aromatic perception for PDB ligands.
– Superposed ligand and/or proteins can be exported in PDB, SDF and Mol2 file format.

Find the most interesting hits with the MED-SuMo score:
Working with MED-SuMo is easy and fast. Simply by clicking, superpositions of structures and fragments can be displayed in the 3D viewer. Hits are retreived and ranked in a spreadsheet according to the MED-SuMo score which takes into account chemical features and shape and is very efficient to identify the best hits:
– Only true positives are found on the top of list (MED-SuMo score > 6.0)
– Interfamily hits are found from MED-SuMo score 6.0 to 3.0
– False positives have a drastically different signature than true positives and they can easily be filtered out

MED-SuMo server:
– Runs on linux systems
– Takes advantage of the most recent hardware architectures: multicore processors and computer clusters
– Command Line User Interface (CLUI) with LUA scripts
– MED-SuMo databases are easy to update from the PDB website

MED-SuMo graphical user interface GUI:
– Runs on Windows: XP, Vista, Seven, 8, 10

The GUI features:

1. Connect to the server

Fig.: The configuration window enables the connection to the MED-SuMo server. The IP address and port number are required to connect to the server. A profile corresponding to MED-SuMo databases and a login/passwd is defined for each user. A drop down menu helps to select the precomputed MED-SuMo databases available on the server. The profile needs to be defined for each user by the administrator with the MED-Manager. Databases are either sites database (e.g. db45) or full database (dbfull). db45 stands for a database of sites defined by the 4.5 Å environnement of the ligand. Others databases are computed  with different triplets parameters (e.g. 13-39 or 20-60).

2. Build the query : The query is a surface of a protein (locally stored or from the database). The surface is described by a graph of triplets of surface chemical features.

Visualize the triplets of surface chemical features:

Fig.: Graphical representation of a query: protein (cartoon, blue) ; ligand (stick, carbon atoms grey) ; MED-SuMo Surface Chemical Features (balls and sticks). A few chemical features are labeled with the associated residue (e.g. ARG 221) and their type (e.g. guanidinium). The SuMo Objects window helps to visualize the query as it is processed by MED-SuMo heuristic (e.g. a graph of triplets of Surface Chemical Features). By clicking on one object (hydrophobic Phe 182), all the triplets containing this object are shown in the 3D viewer as semi-transparent light blue triangles.

Choose residues or atoms to define the query or change the rendering

Fig.: Graphical representation of a protein query (lines, carbon atoms grey) ; ligand (stick, carbon atoms grey) ; MED-SuMo Surface Chemical Features (balls and sticks). A few chemical features are labeled with the associated residue (e.g. ARG 221) and their type (e.g. guanidinium). The treeview window helps to select residues, atoms in the viewer and to manually define the query or change the rendering.

Choose the query with auto-detected sites:

Fig.: Graphical representation of a query: protein (lines, carbon atoms grey) ; ligand (stick, carbon atoms grey) ; MED-SuMo Surface Chemical Features (balls and sticks). A few chemical features are labeled with the associated residue (e.g. ARG 221) and their type (e.g. guanidinium). The query window Tab\ Reference Protein helps to choose the definition of the query. Here, an autodected binding site is selected. Autodetected binding sites are due to the presence of a co-cristallized ligand, heteroatoms or small peptide (by default less that 10 residues).

Choose the query by defining an user advanced selection:

Fig.: Query window / Tab Reference Protein showing the advanced mode. A query can be defined by a logical combination of rules. The rules are stated as (around or equal) from (residue, atom, atoms, object, object type, residu type, manual query). Here an example is shown where all the objects of the proteins are selected (full surface) except the environement of the ligand IZ3.

Choose the database: The MED-SuMo databases (Sites or Full) are precomputed. The user chooses the database at the moment of the connection. Then chooses the structures of the database: whole database, subset only, a single locally stored structureBefore launching the job, the user can select a subset of the database or compare the query to a single file stored locally).

Fig.: Database explorer window helps to make a subset selection from the structure database. Here the PDB is parsed with the keyword phosphatase. Other fields that can be queried are shown in the drop down menu. Using a subset helps to focus on a given family, on X-ray structure with a minimal chosen resolution or from a given author… The list of PDB codes can be saved as a file with the extension .sub. It can be used in Query window to launch a run on a subset of the database.
Fig.: the Query window / Tab Compare To helps to choose if a database is used or a single structure (which could be stored locally). A subset of the database can be chosen putting the names in the window (e.g. 4 letters code for PDB files), this subset can be built with the database explorer.

3. Browse the results in the result table

Fig.: the Result window is a spreadsheet containing the hits. The hits are sorted by default with a decreasing MED-SuMo score order. The most important columns are shown: hit number, hit ligand 2D depiction, checkboxes to superpose the hits (protein and ligand) to the query in the 3D viewer window, the PDB code, the PDB ligand code, the MED-SuMo score, the signature of 3D chemical features. The MED-SuMo site database contains both synthetic ligand, peptides and peptidomimetics.

4. View  hits superposed to the query in the 3D viewer

Fig.: The 3D viewer window helps to visualize the superpositions in 3D based on the superposition of the Surface Chemical Features. A hit protein (orange) is superposed to the query (grey), the ligands are shown with carbon atoms colored with the same colors. Surface chemical features are shown, they match by pairs.

5. Sort and select results

Fig.: The Row Selection Window helps to select hits. Here a rule is created to keep only hits with a minimal MED-SuMo score of 4. Others rules can be added and a logical expression can be used.

6. Selected hits can be clustered according to their SCF signature

Fig.: The dendrogram window helps to choose the number of clusters by moving a slide bar. The signature consensus cutoff can be set. It corresponds to the percentage of occurences of a given feature in the consensus signature. View results clustered according to their signature

MEDP-Fragmentor is part of the MED-SuMo fragment-based technology and performes the generation of the MED-SuMo fragment database from a database like the PDB.

The MED-portions are encoded in the MED-SuMo fragment database which is ready to be searched and browsed with MED-SuMo. These 3D patterns, called MED-Portions, include chemical moieties which are matching molecules from a chemical library and substructures of protein-bound ligand. MED-Portions, which are the MED-SuMo representation of protein-fragment patterns defined by several criteria: (1) a chemical moiety where atoms are topologically matching with a molecule from molecular libraries, e.g. synthetically accessible molecules or building blocks, (2) open valences filled by ‘dummy atoms’ that indicate where it was connected in the original ligand, and (3) the protein interaction surface surrounding that chemical moiety described by the MED-SuMo Surface Chemical Features (SCFs).

Comments are closed.