T016 · Protein-ligand interactions

Note: This talktorial is a part of TeachOpenCADD, a platform that aims to teach domain-specific skills and to provide pipeline templates as starting points for research projects.

Authors:

Aim of this talktorial

In this talktorial, we focus on protein-ligand interactions. Understanding such interactions, which are driving molecular recognition, are fundamental in drug design.

To this end, we use two Python tools: the first one, called the Protein–Ligand Interaction Profiler, or PLIP, to get insight into protein-ligand interactions for any sample complex and the second, NGLView, to visualize the interactions in 3D.

Contents in Theory

  • Protein-ligand interactions

  • PLIP: Protein–Ligand Interaction Profiler

    • Web service

    • Algorithm

  • Visualization: complex and interactions

Contents in Practical

  • PDB complex: example with EGFR

  • Profiling protein-ligand interactions using PLIP

  • Table of interaction types

  • Visualization with NGLView

    • Analysis of interactions

References

Theory

Protein-ligand interactions

Ligand binding is mainly governed by non-covalent interactions between the ligand and the surface of the protein pocket or the protein-protein interface. This process is a function of electrostatic and shape complementarities, induced fitting, desolvation processes and more.

Some quotes from the literature:

Adapted from José L. Medina-Franco, Oscar Méndez-Lucio, Karina Martinez-Mayorga:

Understanding protein–ligand interactions (PLIs) and protein–protein interactions (PPIs) is at the core of molecular recognition and has a fundamental role in many scientific areas. PLIs and PPIs have a broad area of practical applications in drug discovery including but not limited to molecular docking, structure-based design, virtual screening of molecular fragments, small molecules, and other types of compounds, clustering of complexes, and structural interpretation of activity cliffs, to name a few.

These interactions can be rationalized in several ways, which for example opens the door to systematic analysis of the docking solutions.

protein ligand non-covalent interactions

Figure 1 : Frequency of non-colavent interactions. Out of 750,873 ligand–protein atom pairs extracted from complexes from the PDB data base, the 100 most frequent pairs can be grouped into seven interaction types, as shown in the figure (taken from the paper by de Freitas and Schapira, A systematic analysis of atomic protein–ligand interactions in the PDB, 2017).

There are several programs to assess protein-ligand interactions in an automated way.

For example, the Kinase-Ligand Interaction Fingerprints and Structures database (KLIFS) is a kinase centric tool with a freely available web service. More details can be found in Talktorial T012: “Data acquisition from KLIFS”.

PLIP: Protein–Ligand Interaction Profiler

For a more general set of proteins, PLIP is also popular thanks to its publicly available webserver and free-to-use Python library.

PLIP: web service

The PLIP web service is a freely available tool which allows to exhibit protein-ligand interactions from any PDB structure as shown in Figure 2 below.

aromatic interaction

Figure 2 : Visualization of protein-ligand interactions generated by the PLIP web service. The example displays an EGFR kinase complex, associated PDB ID 3POZ. The figure, a snapshot from the web service result, shows

  • the 3D plot of the protein and the ligand and the different interactions types detected (see legend on the right).

  • an example table describing the hydrophobic interactions between atoms from the protein and the ligand.

PLIP: algorithm

For each binding site, the PLIP algorithm considers the atoms from the protein and the ligand only if they lie within a certain distance cutoff. Once the protein and ligand atoms that potentially interact are identified, non-covalent interactions can be detected, such as:

  • hydrophobic interactions

  • hydrogen bonds

  • aromatic stacking

  • pi-cation interactions

  • salt bridges

  • water-bridged hydrogen bonds

  • halogen bonds

They are defined using geometry rules, such as distance and angle thresholds.

The supporting information accompanying the manuscript (Nucl. Acids Res. (2015), 43, W1, W443-W447) describes the binding site detection and protein-ligand interactions in detail, please see the PDF document below.

[1]:
from teachopencadd.utils import show_pdf
[2]:
pdf = (
    "https://www.ncbi.nlm.nih.gov/"
    "pmc/articles/PMC4489249/bin/supp_gkv315_nar-00254-web-b-2015-File003.pdf"
)
show_pdf(pdf)

Visualization: complex and interactions

We will use nglview for visualization. It’s a web-based molecular viewer that can be run on Jupyter notebooks. We will first use it in a basic way to visualize a complex of interest. Then, we will make use of ipywidgets layouts to visualize protein-ligand interactions.

More details can be found in Talktorial T017 on advanced NGLview & widget usage.

Practical

[3]:
# Import libraries
from pathlib import Path
import time
import warnings

warnings.filterwarnings("ignore")

import pandas as pd
import nglview as nv
import openbabel
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors
from plip.structure.preparation import PDBComplex
from plip.exchange.report import BindingSiteReport

from opencadd.structure.core import Structure
[4]:
# Absolute path
HERE = Path(_dh[-1])
DATA = HERE / "data"

PDB complex: example with EGFR

As a test case for this notebook, we choose the EGFR kinase. The considered PDB structure is given by the ID 3POZ. Let’s use nglview to visualize the structure in a notebook cell.

Note: the complex can easily be changed by adapting the PDB ID in the cell below.

[5]:
pdb_id = "3poz"

We now fetch the PDB structure file from the PDB using opencadd.structure.superposition. More details can be found in Talktorial T008 for protein data acquisition.

[6]:
pdb_file = Structure.from_pdbid(pdb_id)
# Download it to file for later use
pdb_file.write(DATA / f"{pdb_id}.pdb")

Show the complex based on PDB ID

[7]:
ngl_viewer = nv.show_pdbid(pdb_id)
# add the ligands
ngl_viewer.add_representation(repr_type="ball+stick", selection="hetero and not water")
# center view on binding site
ngl_viewer.center("ligand")
ngl_viewer
[8]:
# render a static image
ngl_viewer.render_image(trim=True, factor=2);
[9]:
ngl_viewer._display_image()
[9]:
../_images/talktorials_T016_protein_ligand_interactions_30_0.png

Profiling protein-ligand interactions using PLIP

PLIP offers a webserver for automated analysis, but unfortunately there is no API. We could try to use the HTML forms as if we were using the standard web UI, but since the library itself is Python-3 ready and very easy to install with pip, we can just use it locally for simplicity.

PLIP takes PDB files as input, so we can pass the PDB file to PLIP and let it do its magic. The BindingSiteReport class processes each detected binding site in PDBComplex and creates an object with the (eight) fields we are interested in, namely

  • hydrophobic interaction : hydrophobic

  • hydrogen bond : hbond

  • water bridge : waterbridge

  • salt bridge : saltbridge

  • \(\pi\)-stacking (parallel and perpendicular) : pistacking

  • \(\pi\)- cation : pication

  • halogen bond : halogen

  • metal complexation : metal

These fields are divided in <field>_features (containing column names) and <field>_info (containing the actual records). If we iterate over the object retrieving the correct attribute name with getattr(), we can compose a dictionary that can be passed to a pandas.DataFrame for nice overviews.

This dictionary is composed of two levels:

  • First level is the detected binding sites.

  • For each binding site, we have one more sub-dictionary containing eight lists, one for each specific interaction.

    • Each list will contain the column names in the first row, and the data (if available) in the following.

[10]:
def retrieve_plip_interactions(pdb_file):
    """
    Retrieves the interactions from PLIP.

    Parameters
    ----------
    pdb_file :
        The PDB file of the complex.

    Returns
    -------
    dict :
        A dictionary of the binding sites and the interactions.
    """
    protlig = PDBComplex()
    protlig.load_pdb(pdb_file)  # load the pdb file
    for ligand in protlig.ligands:
        protlig.characterize_complex(ligand)  # find ligands and analyze interactions
    sites = {}
    # loop over binding sites
    for key, site in sorted(protlig.interaction_sets.items()):
        binding_site = BindingSiteReport(site)  # collect data about interactions
        # tuples of *_features and *_info will be converted to pandas DataFrame
        keys = (
            "hydrophobic",
            "hbond",
            "waterbridge",
            "saltbridge",
            "pistacking",
            "pication",
            "halogen",
            "metal",
        )
        # interactions is a dictionary which contains relevant information for each
        # of the possible interactions: hydrophobic, hbond, etc. in the considered
        # binding site. Each interaction contains a list with
        # 1. the features of that interaction, e.g. for hydrophobic:
        # ('RESNR', 'RESTYPE', ..., 'LIGCOO', 'PROTCOO')
        # 2. information for each of these features, e.g. for hydrophobic
        # (residue nb, residue type,..., ligand atom 3D coord., protein atom 3D coord.)
        interactions = {
            k: [getattr(binding_site, k + "_features")] + getattr(binding_site, k + "_info")
            for k in keys
        }
        sites[key] = interactions
    return sites

We create the dictionary for the complex of interest:

[11]:
interactions_by_site = retrieve_plip_interactions(f"{DATA}/{pdb_id}.pdb")

Let’s see how many binding sites are detected:

[12]:
print(
    f"Number of binding sites detected in {pdb_id} : "
    f"{len(interactions_by_site)}\n"
    f"with {interactions_by_site.keys()}"
)
# NBVAL_CHECK_OUTPUT
Number of binding sites detected in 3poz : 4
with dict_keys(['03P:X:1023', 'SO4:X:1', 'SO4:X:2', 'SO4:X:3'])

In this case, the first binding site containing ligand 03P will be further investigated.

[13]:
index_of_selected_site = 0
selected_site = list(interactions_by_site.keys())[index_of_selected_site]
print(selected_site)
03P:X:1023

Table of interaction types

We can construct a pandas.DataFrame for a binding site and particular interaction type.

[14]:
def create_df_from_binding_site(selected_site_interactions, interaction_type="hbond"):
    """
    Creates a data frame from a binding site and interaction type.

    Parameters
    ----------
    selected_site_interactions : dict
        Precaluclated interactions from PLIP for the selected site
    interaction_type : str
        The interaction type of interest (default set to hydrogen bond).

    Returns
    -------
    pd.DataFrame :
        DataFrame with information retrieved from PLIP.
    """

    # check if interaction type is valid:
    valid_types = [
        "hydrophobic",
        "hbond",
        "waterbridge",
        "saltbridge",
        "pistacking",
        "pication",
        "halogen",
        "metal",
    ]

    if interaction_type not in valid_types:
        print("!!! Wrong interaction type specified. Hbond is chosen by default!!!\n")
        interaction_type = "hbond"

    df = pd.DataFrame.from_records(
        # data is stored AFTER the column names
        selected_site_interactions[interaction_type][1:],
        # column names are always the first element
        columns=selected_site_interactions[interaction_type][0],
    )
    return df

Hydrophobic interactions

In the next cell, we show the hydrophobic interactions from the selected binding site.

[15]:
create_df_from_binding_site(interactions_by_site[selected_site], interaction_type="hydrophobic")
[15]:
RESNR RESTYPE RESCHAIN RESNR_LIG RESTYPE_LIG RESCHAIN_LIG DIST LIGCARBONIDX PROTCARBONIDX LIGCOO PROTCOO
0 745 LYS X 1023 03P X 3.91 2398 320 (18.317, 32.25, 10.052) (20.469, 34.989, 8.267)
1 788 LEU X 1023 03P X 3.89 2396 593 (15.676, 34.766, 8.319) (16.314, 35.031, 4.495)
2 788 LEU X 1023 03P X 3.66 2383 595 (18.404, 30.743, 6.486) (18.317, 33.573, 4.169)
3 854 THR X 1023 03P X 3.82 2382 1138 (18.135, 32.543, 11.422) (17.798, 28.992, 12.797)
4 858 LEU X 1023 03P X 3.93 2383 1167 (18.404, 30.743, 6.486) (22.084, 30.736, 5.093)
5 745 LYS X 1023 03P X 3.53 2396 318 (15.676, 34.766, 8.319) (18.634, 36.648, 7.936)
6 790 THR X 1023 03P X 3.48 2396 611 (15.676, 34.766, 8.319) (12.875, 33.449, 9.914)

As you can notice, this table matches the figure in the Theory part of the notebook.

Hydrogen interactions

If hydrogen interactions are of interest, the table can be generated as follow:

[16]:
create_df_from_binding_site(interactions_by_site[selected_site], interaction_type="hbond")
[16]:
RESNR RESTYPE RESCHAIN RESNR_LIG RESTYPE_LIG RESCHAIN_LIG SIDECHAIN DIST_H-A DIST_D-A DON_ANGLE PROTISDON DONORIDX DONORTYPE ACCEPTORIDX ACCEPTORTYPE LIGCOO PROTCOO
0 793 MET X 1023 03P X False 2.01 2.96 163.57 True 629 Nam 2404 N2 (13.371, 34.064, 15.005) (10.667, 33.654, 16.145)

Halogen interactions

Let’s also have a look at halogen interactions:

[17]:
create_df_from_binding_site(interactions_by_site[selected_site], interaction_type="halogen")
[17]:
RESNR RESTYPE RESCHAIN RESNR_LIG RESTYPE_LIG RESCHAIN_LIG SIDECHAIN DIST DON_ANGLE ACC_ANGLE DON_IDX DONORTYPE ACC_IDX ACCEPTORTYPE LIGCOO PROTCOO
0 766 MET X 1023 03P X False 3.60 167.05 118.86 2389 F 431 O2 (12.164, 26.835, 3.777) (14.283, 28.118, 6.395)
1 790 THR X 1023 03P X True 3.47 171.27 103.84 2388 F 610 O3 (11.467, 31.629, 9.124) (13.867, 29.356, 8.056)

Visualization with NGLView

Now, let’s try to represent those interactions in the NGL viewer. We can draw cylinders between the interaction points (LIGCOO and PROTCOO in the pandas.DataFrame) and color-code them as shown in color_map, which uses RGB tuples.

[18]:
color_map = {
    "hydrophobic": [0.90, 0.10, 0.29],
    "hbond": [0.26, 0.83, 0.96],
    "waterbridge": [1.00, 0.88, 0.10],
    "saltbridge": [0.67, 1.00, 0.76],
    "pistacking": [0.75, 0.94, 0.27],
    "pication": [0.27, 0.60, 0.56],
    "halogen": [0.94, 0.20, 0.90],
    "metal": [0.90, 0.75, 1.00],
}

Let’s see what these RGB colors correspond to:

[19]:
fig, axs = plt.subplots(nrows=2, ncols=4, figsize=(15, 2))
plt.subplots_adjust(hspace=1)
fig.suptitle("Colors for interaction types", size=16, y=1.2)

for ax, (interaction, color) in zip(fig.axes, color_map.items()):
    ax.imshow(np.zeros((1, 5)), cmap=colors.ListedColormap(color_map[interaction]))
    ax.set_title(interaction, loc="center")
    ax.set_axis_off()
plt.show()
../_images/talktorials_T016_protein_ligand_interactions_58_0.png

Define a helper function to show the interactions.

[20]:
def show_interactions_3d(
    pdb_id, selected_site_interactions, highlight_interaction_colors=color_map
):
    """
    3D visualization of protein-ligand interactions.

    Parameters
    ----------
    pdb_id : str
        The pdb ID of interest.
    selected_site_interactions : dict
        Precaluclated interactions from PLIP for the selected site
    highlight_interaction_colors : dict
        The colors used to highlight the different interaction types.

    Returns
    -------
    NGL viewer with explicit interactions given by PLIP.
    """

    # Create NGLviewer
    viewer = nv.NGLWidget(height="600px", default=True, gui=True)
    # Add protein
    prot_component = viewer.add_pdbid(pdb_id)
    # Add the ligands
    viewer.add_representation(repr_type="ball+stick", selection="hetero and not water")

    interacting_residues = []
    for interaction_type, interaction_list in selected_site_interactions.items():
        color = highlight_interaction_colors[interaction_type]
        if len(interaction_list) == 1:
            continue
        df_interactions = pd.DataFrame.from_records(
            interaction_list[1:], columns=interaction_list[0]
        )
        for _, interaction in df_interactions.iterrows():
            name = interaction_type
            # add cylinder between ligand and protein coordinate
            viewer.shape.add_cylinder(
                interaction["LIGCOO"],
                interaction["PROTCOO"],
                color,
                [0.1],
                name,
            )
            interacting_residues.append(interaction["RESNR"])
    # Display interacting residues
    res_sele = " or ".join([f"({r} and not _H)" for r in interacting_residues])
    res_sele_nc = " or ".join([f"({r} and ((_O) or (_N) or (_S)))" for r in interacting_residues])
    prot_component.add_ball_and_stick(sele=res_sele, colorScheme="chainindex", aspectRatio=1.5)
    prot_component.add_ball_and_stick(sele=res_sele_nc, colorScheme="element", aspectRatio=1.5)
    # Center on ligand
    viewer.center("ligand")
    return viewer
[21]:
viewer_3d = show_interactions_3d(pdb_id, interactions_by_site[selected_site])
viewer_3d
[22]:
viewer_3d.render_image(trim=True, factor=2, transparent=True);
[24]:
viewer_3d._display_image()
[24]:
../_images/talktorials_T016_protein_ligand_interactions_63_0.png

Analysis of interactions

As we can see in the NGL viewer, PLIP manages to identify different interactions between the protein and the ligand in the binding site, for our kinase example (3poz):

  • The typical hinge hydrogen binding with methionine residue MET793.

  • Hydrophobic interactions with the following residues:

    • LYS745

    • LEU788

    • THR790

    • THR854

    • LEU858

  • Halogen interactions with residues:

    • MET766

    • LEU788

    • THR790

Note that for example the hinge interaction is equally identified in KLIFS, see 3poz KLIFS fingerprint, while the hydrophobic interactions identified with PLIP are only a subset of those in KLIFS. Halogen interactions are not explicitly annotated in KLIFS.

All the identified interactions in NGLview do indeed correspond to the table of interactions generated above.

Discussion

In this talktorial we have learned about protein-ligand interactions, more specifically in the context of the Protein–Ligand Interaction Profiler, or PLIP for short. We created a DataFrame to depict the interactions in a table. Furthermore, we made use of the NGL viewer to visualize these interactions in 3D, which require a good amount of web technologies, mainly based around the NGL viewer itself and ipywidgets layouts.

Quiz

  • Do some interactions seem more important than others?

  • What’s the main difference between hydrophobic interactions and hydrogen bonds? How are they similar?

  • What can be a considerable advantage of using PLIP over KLIFS?