T025 · Kinase similarity: Kinase pocket (KiSSim fingerprint)¶
Note: This talktorial is a part of TeachOpenCADD, a platform that aims to teach domain-specific skills and to provide pipeline templates as starting points for research projects.
Authors:
Dominique Sydow, 2021, Volkamer lab, Charité
Talia B. Kimber, 2021, Volkamer lab, Charité
Andrea Volkamer, 2021, Volkamer lab, Charité
Aim of this talktorial¶
We will assess the similarity between a set of kinases from a structural point of view using the KiSSim fingerprint. This fingerprint describes the physicochemical and spatial properties in structurally resolved kinases.
Note: We focus on similarities between orthosteric kinase binding sites; similarities to allosteric binding sites are not covered.
Contents in Theory¶
Kinase dataset
Kinase similarity descriptor: Kinase pockets (KiSSim fingerprint)
Fetching KLIFS data with
opencadd.databases.klifs
Contents in Practical¶
Define the kinases of interest
Retrieve and preprocess data
Set up a remote KLIFS session
Fetch all structures describing these kinases
Filter structures
Show kinase coverage
Calculate KiSSim fingerprints
Compare structures
Map structure to kinase distance matrix
Save kinase distance matrix
References¶
Kinase dataset: Molecules (2021), 26(3), 629
Protein Data Bank
PDB URL: http://www.rcsb.org/
PDB database: Acta Cryst. (2002), D58, 899-907 and Structure (2012), 20(3), 391-396
KLIFS
KLIFS URL: https://klifs.net/
KLIFS database: Nucleic Acid Res. (2020), 49(D1), D562-D569
KLIFS binding site definition: J. Med. Chem. (2014), 57(2), 249-277
Binding site comparison reviews:
KiSSim: Kinase Structural Similarity
GitHub repository: https://github.com/volkamerlab/kissim
Documentation: https://kissim.readthedocs.io
opencadd
, a Python library for structural cheminformaticsGitHub repository: https://github.com/volkamerlab/opencadd
Documentation: https://opencadd.readthedocs.io
Theory¶
Kinase dataset¶
We use the kinase selection as defined in Talktorial T023.
Kinase similarity descriptor: Kinase pockets (KiSSim fingerprint)¶
Side effects often occur when a drug binds not only to its designated target (on-target) but also to other targets (off-targets) that share similar binding sites to form similar protein-ligand interaction patterns. Many binding site comparison tools have been proposed to predict similarities and potential off-targets (see Curr. Comput. Aided Drug Des. (2008), 4, 209-20 and J. Med. Chem. (2016), 9, 4121-51).
Here we use the novel KiSSim (Kinase Structure Similarity) fingerprint, which is based on the KLIFS pocket definition and alignment of \(85\) pocket residues (see more details in Talktorial T023).
The KiSSim fingerprint encodes each of the \(85\) residues in the KLIFS binding site with respect to physicochemical and spatial properties (Figure 1). Since all KLIFS pockets are aligned across the structurally covered kinome, we can compare the KiSSim fingerprints bit by bit. Physicochemical properties of each residue include pharmacophoric features, size, side chain orientation, and solvent exposure. Spatial properties describe the distance of each residue to defined important regions in the kinase pocket, for example the hinge region, which forms crucial hydrogen bonds to ligands, or the DFG region, whose conformation determines the activity state of the kinase. See more details on structural motifs in kinases in Talktorial T023.
Figure 1: The KiSSim fingerprint encodes physicochemical and spatial properties of a kinase binding site. Figure taken from: https://github.com/volkamerlab/kissim
Fetching KLIFS data with opencadd.databases.klifs
¶
opencadd
is a Python library for structural cheminformatics developed by the Volkamer lab at the Charité in Berlin. This library is a growing collection of modules that help facilitate and standardize common tasks in structural bioinformatics and cheminformatics. Today, we will use the module opencadd.databases.klifs
, which allows us to fetch the KLIFS structures as a pandas
DataFrame.
GitHub repository: https://github.com/volkamerlab/opencadd
Documentation: https://opencadd.readthedocs.io
For more information about this library and the KLIFS OpenAPI, please refer to Talktorial T012.
Practical¶
[1]:
from pathlib import Path
import numpy as np
import pandas as pd
from sklearn.metrics import pairwise
import matplotlib.pyplot as plt
import seaborn as sns
from opencadd.databases.klifs import setup_remote
import kissim
[2]:
HERE = Path(_dh[-1])
DATA = HERE / "data"
[3]:
configs = pd.read_csv(HERE / "../T023_what_is_a_kinase/data/pipeline_configs.csv")
configs = configs.set_index("variable")["default_value"]
DEMO = bool(int(configs["DEMO"]))
N_STRUCTURES_PER_KINASE = int(configs["N_STRUCTURES_PER_KINASE"])
N_CORES = int(configs["N_CORES"])
print(f"Run in demo mode: {DEMO}")
if not DEMO:
if N_STRUCTURES_PER_KINASE > 0:
print(f"Number of structures per kinase: {N_STRUCTURES_PER_KINASE}")
else:
print(f"Number of structures per kinase: all available structures")
print(f"Number of cores used: {N_CORES}")
# NBVAL_CHECK_OUTPUT
Run in demo mode: True
Define the kinases of interest¶
Let’s load the kinase selection as defined in Talktorial T023.
[4]:
kinase_selection_df = pd.read_csv(HERE / "../T023_what_is_a_kinase/data/kinase_selection.csv")
kinase_selection_df
# NBVAL_CHECK_OUTPUT
[4]:
kinase | kinase_klifs | uniprot_id | group | full_kinase_name | |
---|---|---|---|---|---|
0 | EGFR | EGFR | P00533 | TK | Epidermal growth factor receptor |
1 | ErbB2 | ErbB2 | P04626 | TK | Erythroblastic leukemia viral oncogene homolog 2 |
2 | PI3K | p110a | P42336 | Atypical | Phosphatidylinositol-3-kinase |
3 | VEGFR2 | KDR | P35968 | TK | Vascular endothelial growth factor receptor 2 |
4 | BRAF | BRAF | P15056 | TKL | Rapidly accelerated fibrosarcoma isoform B |
5 | CDK2 | CDK2 | P24941 | CMGC | Cyclic-dependent kinase 2 |
6 | LCK | LCK | P06239 | TK | Lymphocyte-specific protein tyrosine kinase |
7 | MET | MET | P08581 | TK | Mesenchymal-epithelial transition factor |
8 | p38a | p38a | Q16539 | CMGC | p38 mitogen activated protein kinase alpha |
Retrieve and preprocess data¶
Now, we query the KLIFS database using the opencadd.databases.klifs
module to fetch available structures in KLIFS.
Set up a remote KLIFS session¶
[5]:
from opencadd.databases.klifs import setup_remote
[6]:
klifs_session = setup_remote()
Fetch all structures describing these kinases¶
[7]:
# Get list of kinase names
kinase_names = kinase_selection_df["kinase_klifs"].to_list()
# Get all available structures for these kinases
structures_df = klifs_session.structures.by_kinase_name(kinase_names=kinase_names)
print(f"Number of structures: {len(structures_df)}")
print("Kinases:", *structures_df["kinase.klifs_name"].unique())
Number of structures: 2640
Kinases: CDK2 p38a EGFR ErbB2 MET LCK KDR BRAF p110a
Let’s have a look at what is stored in the structures’ DataFrame:
[8]:
structures_df.columns
[8]:
Index(['structure.klifs_id', 'structure.pdb_id', 'structure.alternate_model',
'structure.chain', 'species.klifs', 'kinase.klifs_id',
'kinase.klifs_name', 'kinase.names', 'kinase.family', 'kinase.group',
'structure.pocket', 'ligand.expo_id', 'ligand_allosteric.expo_id',
'ligand.klifs_id', 'ligand_allosteric.klifs_id', 'ligand.name',
'ligand_allosteric.name', 'structure.dfg', 'structure.ac_helix',
'structure.resolution', 'structure.qualityscore',
'structure.missing_residues', 'structure.missing_atoms',
'structure.rmsd1', 'structure.rmsd2', 'interaction.fingerprint',
'structure.front', 'structure.gate', 'structure.back', 'structure.fp_i',
'structure.fp_ii', 'structure.bp_i_a', 'structure.bp_i_b',
'structure.bp_ii_in', 'structure.bp_ii_a_in', 'structure.bp_ii_b_in',
'structure.bp_ii_out', 'structure.bp_ii_b', 'structure.bp_iii',
'structure.bp_iv', 'structure.bp_v', 'structure.grich_distance',
'structure.grich_angle', 'structure.grich_rotation',
'structure.filepath', 'structure.curation_flag'],
dtype='object')
[9]:
structures_df.head()
[9]:
structure.klifs_id | structure.pdb_id | structure.alternate_model | structure.chain | species.klifs | kinase.klifs_id | kinase.klifs_name | kinase.names | kinase.family | kinase.group | ... | structure.bp_ii_out | structure.bp_ii_b | structure.bp_iii | structure.bp_iv | structure.bp_v | structure.grich_distance | structure.grich_angle | structure.grich_rotation | structure.filepath | structure.curation_flag | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4341 | 3qql | A | A | Human | 198 | CDK2 | <NA> | <NA> | <NA> | ... | False | False | False | False | False | 14.3132 | 50.348301 | 67.737099 | <NA> | False |
1 | 4343 | 3qxp | A | A | Human | 198 | CDK2 | <NA> | <NA> | <NA> | ... | False | False | False | False | False | 14.7860 | 49.589199 | 53.625000 | <NA> | False |
2 | 4344 | 4d1x | - | A | Human | 198 | CDK2 | <NA> | <NA> | <NA> | ... | False | False | False | False | False | 13.7952 | 47.828999 | 67.880096 | <NA> | False |
3 | 4345 | 3r71 | B | A | Human | 198 | CDK2 | <NA> | <NA> | <NA> | ... | False | False | False | False | False | 14.7385 | 51.853500 | 64.533699 | <NA> | False |
4 | 4346 | 3qhw | A | A | Human | 198 | CDK2 | <NA> | <NA> | <NA> | ... | False | False | False | False | False | 14.2767 | 47.307201 | 52.166000 | <NA> | False |
5 rows × 46 columns
Filter structures¶
We filter the structures by different criteria:
Species: human
Conformation: DFG-in (the active kinase conformation)
Resolution: \(\le 3\) Angström
Quality score*: \(\ge 6\)
* The KLIFS quality score takes into account the quality of the alignment, as well as the number of missing residues and atoms. A higher score indicates a better structure quality.
[10]:
structures_df = structures_df[
(structures_df["species.klifs"] == "Human")
& (structures_df["structure.dfg"] == "in")
& (structures_df["structure.resolution"] <= 3)
& (structures_df["structure.qualityscore"] >= 6)
]
print(f"Number of structures: {len(structures_df)}")
print("Kinases:", *structures_df["kinase.klifs_name"].unique())
Number of structures: 1748
Kinases: CDK2 p38a EGFR ErbB2 MET LCK KDR BRAF p110a
Save the structure KLIFS IDs for the next step.
[11]:
structure_klifs_ids = structures_df["structure.klifs_id"].to_list()
print(f"Number of structures: {len(structure_klifs_ids)}")
Number of structures: 1748
Note for demo mode: To make it easier for us to maintain the talktorials, we will now load a set of frozen structure KLIFS IDs (2021-08-23) and continue to work with those.
Note for non-demo mode: Did you specify N_STRUCTURES_PER_KINASE
in the configuration file? If you e.g. set a value of 3, we will select in the following the top 3 structures per kinase in terms of resolution and KLIFS quality score.
[12]:
if DEMO:
# Load frozen dataset
print("Notebook is run in demo mode - load frozen structure set.")
structure_klifs_ids = pd.read_csv(DATA / "frozen_structure_klifs_ids.csv")[
"structure.klifs_id"
].to_list()
structures_df = structures_df[
structures_df["structure.klifs_id"].isin(structure_klifs_ids)
].copy()
else:
if N_STRUCTURES_PER_KINASE > 0:
print(f"Select {N_STRUCTURES_PER_KINASE} structures per kinase for downstream analysis.")
# Sort structures by kinase and quality
structures_df = structures_df.sort_values(
by=["kinase.klifs_name", "structure.resolution", "structure.qualityscore"],
ascending=[True, True, False],
)
# Reduce number of structures per kinase
structures_df = structures_df.groupby("kinase.klifs_name").head(N_STRUCTURES_PER_KINASE)
structure_klifs_ids = structures_df["structure.klifs_id"].to_list()
else:
print(f"Use all available structures per kinase for downstream analysis.")
print(f"Number of structures: {structures_df.shape[0]}")
# NBVAL_CHECK_OUTPUT
Notebook is run in demo mode - load frozen structure set.
Number of structures: 1620
Show kinase coverage¶
Let’s get the number of structures that describe our kinases (kinase coverage).
[13]:
# Use pandas' groupby method to count the number of structures (rows) per kinase
n_structures_per_kinase = structures_df.groupby("kinase.klifs_name").size().sort_values()
n_structures_per_kinase
# NBVAL_CHECK_OUTPUT
[13]:
kinase.klifs_name
ErbB2 4
KDR 6
LCK 32
p110a 49
BRAF 72
MET 99
p38a 151
EGFR 357
CDK2 850
dtype: int64
Let’s plot the results.
[14]:
fig, ax = plt.subplots()
n_structures_per_kinase.plot(kind="barh", ax=ax)
ax.set_xlabel("Number of structures")
ax.set_ylabel("Kinase name")
for i, value in enumerate(n_structures_per_kinase):
ax.text(value, i, str(value), va="center")
We see that our dataset is highly imbalanced. While some kinases are structurally resolved very often, other kinases are not. We will have to keep this in mind when interpreting our results later.
Calculate KiSSim fingerprints¶
We use the kissim
API to encode our structures as KiSSim fingerprints and save the fingerprints as CSV file.
Note for demo mode: We use pre-calculated KiSSim fingerprints for our kinase set (i.e. the next code cell will be skipped).
[15]:
if DEMO:
print("Notebook is run in demo mode - we will use pre-calculated fingerprints.")
else:
print("Calculate and save KiSSim fingerprints...")
# Calculate fingerprints
from kissim.api import encode
kissim_fingerprints = encode(structure_klifs_ids, n_cores=N_CORES)
# Save fingerprints in csv file
structure_klifs_ids = list(kissim_fingerprints.data.keys())
kissim_fingerprints_array = [
fingerprint.values_array().tolist()
for structure_klifs_id, fingerprint in kissim_fingerprints.data.items()
]
kissim_fingerprints_array = np.array(kissim_fingerprints_array)
kissim_fingerprints_df = pd.DataFrame(kissim_fingerprints_array, index=structure_klifs_ids)
kissim_fingerprints_df.to_csv(DATA / "kissim_fingerprints.csv")
# NBVAL_CHECK_OUTPUT
Notebook is run in demo mode - we will use pre-calculated fingerprints.
Let’s load the KiSSim fingerprints from the CSV file.
[16]:
kissim_fingerprints_df = pd.read_csv(DATA / "kissim_fingerprints.csv", index_col=0)
print(f"Matrix shape: {kissim_fingerprints_df.shape}")
print(f"Number of fingerprints: {kissim_fingerprints_df.shape[0]}")
print(f"Number of fingerprint bits: {kissim_fingerprints_df.shape[1]}")
# NBVAL_CHECK_OUTPUT
Matrix shape: (1611, 1032)
Number of fingerprints: 1611
Number of fingerprint bits: 1032
You might notice that we have slightly fewer fingerprints than structures. This can happen during the kissim
fingerprint generation, if a structure cannot be encoded.
Let’s have a look at the fingerprint DataFrame.
Kinase structures are stored by KLIFS ID (index)
Each kinase is represented by \(1032\) features, i.e. \(85\) residues * \(8\) physicochem. features + \(85\) residues * \(4\) distances + \(12\) moments
[17]:
kissim_fingerprints_df.head()
# NBVAL_CHECK_OUTPUT
[17]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 1022 | 1023 | 1024 | 1025 | 1026 | 1027 | 1028 | 1029 | 1030 | 1031 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6285 | 2.0 | 0.0 | 2.0 | -1.0 | 0.0 | 0.0 | 2.0 | 3.0 | 2.0 | 1.0 | ... | 13.150351 | 11.958837 | 4.717011 | 4.843444 | 4.655707 | 3.577213 | 2.771821 | 4.302192 | 3.583341 | 2.066700 |
10568 | 2.0 | 0.0 | 2.0 | -1.0 | 0.0 | 0.0 | 2.0 | 3.0 | 2.0 | 1.0 | ... | 13.069152 | 11.883944 | 4.691527 | 5.006221 | 4.679352 | 3.531177 | 2.714736 | 4.165350 | 3.549843 | 2.138838 |
11187 | 2.0 | 0.0 | 2.0 | -1.0 | 0.0 | 0.0 | 2.0 | 3.0 | 2.0 | 1.0 | ... | 13.297023 | 11.991511 | 4.590040 | 5.141397 | 4.699467 | 3.625989 | 2.549692 | 4.442117 | 3.699695 | 2.261646 |
4060 | 2.0 | 0.0 | 2.0 | -1.0 | 0.0 | 0.0 | 2.0 | 3.0 | 2.0 | 1.0 | ... | 12.910837 | 11.775556 | 4.359330 | 4.844833 | 4.214195 | 3.383812 | 2.699580 | 3.860920 | 3.161863 | 2.185979 |
10566 | 2.0 | 0.0 | 2.0 | -1.0 | 0.0 | 0.0 | 2.0 | 3.0 | 2.0 | 1.0 | ... | 13.196581 | 12.115342 | 4.701176 | 4.690081 | 4.683674 | 3.633418 | 2.489983 | 3.972552 | 3.692501 | 0.759234 |
5 rows × 1032 columns
Compare structures¶
Let’s make a pairwise comparison of the KiSSim fingerprints. We use the nan_euclidean_distances
method of sklearn
, which calculates the Euclidean distance between all pairwise vectors.
If two vectors have NaN values (which can happen if residues are not resolved in a structure), the following procedure is applied:
When calculating the distance between a pair of samples, this formulation ignores feature coordinates with a missing value in either sample and scales up the weight of the remaining coordinates.
Please find more information in the sklearn documentation.
[18]:
structure_distance_matrix_array = pairwise.nan_euclidean_distances(kissim_fingerprints_df.values)
[19]:
# Create DataFrame with structure KLIFS IDs as index/columns
structure_klifs_ids = kissim_fingerprints_df.index.to_list()
structure_distance_matrix_df = pd.DataFrame(
structure_distance_matrix_array, index=structure_klifs_ids, columns=structure_klifs_ids
)
print(f"Structure distance matrix size: {structure_distance_matrix_df.shape}")
print("Show matrix subset:")
structure_distance_matrix_df.iloc[:5, :5]
# NBVAL_CHECK_OUTPUT
Structure distance matrix size: (1611, 1611)
Show matrix subset:
[19]:
6285 | 10568 | 11187 | 4060 | 10566 | |
---|---|---|---|---|---|
6285 | 0.000000 | 13.256941 | 14.001474 | 26.391543 | 14.307291 |
10568 | 13.256941 | 0.000000 | 10.379779 | 27.882193 | 16.833932 |
11187 | 14.001474 | 10.379779 | 0.000000 | 30.962221 | 18.338492 |
4060 | 26.391543 | 27.882193 | 30.962221 | 0.000000 | 28.905189 |
10566 | 14.307291 | 16.833932 | 18.338492 | 28.905189 | 0.000000 |
Map structure to kinase distance matrix¶
Note: So far we compared individual structures, but we want to compare kinases (which can be represented by several structures, as plotted above).
First, as an intermediate step, we will use the structure distance matrix but — instead of labeling the data with structure KLIFS IDs — we use the corresponding kinase name.
[20]:
# Copy distance matrix to kinase matrix
kinase_distance_matrix_df = structure_distance_matrix_df.copy()
# Replace structure KLIFS IDs with the structures' kinase names
kinase_names = structures_df.set_index("structure.klifs_id").loc[
structure_klifs_ids, "kinase.klifs_name"
]
kinase_distance_matrix_df.index = kinase_names
kinase_distance_matrix_df.columns = kinase_names
print("Show matrix subset:")
kinase_distance_matrix_df.iloc[:5, :5]
# NBVAL_CHECK_OUTPUT
Show matrix subset:
[20]:
kinase.klifs_name | CDK2 | CDK2 | CDK2 | CDK2 | CDK2 |
---|---|---|---|---|---|
kinase.klifs_name | |||||
CDK2 | 0.000000 | 13.256941 | 14.001474 | 26.391543 | 14.307291 |
CDK2 | 13.256941 | 0.000000 | 10.379779 | 27.882193 | 16.833932 |
CDK2 | 14.001474 | 10.379779 | 0.000000 | 30.962221 | 18.338492 |
CDK2 | 26.391543 | 27.882193 | 30.962221 | 0.000000 | 28.905189 |
CDK2 | 14.307291 | 16.833932 | 18.338492 | 28.905189 | 0.000000 |
In this talktorial, we will consider per kinase pair the two structures that show the most similar pockets. Hence, we select the structure pair with the minimum distance as representative for a kinase pair.
[21]:
# We unstack the matrix (each pairwise comparison in a single row)
# We group by kinase names (level=[0, 1] ensures that the order of the kinases is ignored
# We take the minimum value in each kinase pair group
# We unstack the remaining data points
kinase_distance_matrix_df = (
kinase_distance_matrix_df.unstack().groupby(level=[0, 1]).min().unstack(level=1)
)
# Cosmetics: Remove the index and column names
kinase_distance_matrix_df.index.name = None
kinase_distance_matrix_df.columns.name = None
[22]:
print(
f"Structure matrix of shape {structure_distance_matrix_df.shape} "
f"reduced to kinase matrix of shape {kinase_distance_matrix_df.shape}."
)
# NBVAL_CHECK_OUTPUT
Structure matrix of shape (1611, 1611) reduced to kinase matrix of shape (9, 9).
[23]:
# Show matrix with background gradient
cm = sns.light_palette("green", as_cmap=True)
kinase_distance_matrix_df.style.background_gradient(cmap=cm).format("{:.3f}")
[23]:
BRAF | CDK2 | EGFR | ErbB2 | KDR | LCK | MET | p110a | p38a | |
---|---|---|---|---|---|---|---|---|---|
BRAF | 0.000 | 17.156 | 19.515 | 21.383 | 21.089 | 21.583 | 20.297 | 37.611 | 21.731 |
CDK2 | 17.156 | 0.000 | 18.147 | 21.106 | 19.880 | 17.973 | 18.304 | 36.780 | 19.481 |
EGFR | 19.515 | 18.147 | 0.000 | 16.392 | 17.282 | 16.467 | 17.498 | 36.046 | 22.128 |
ErbB2 | 21.383 | 21.106 | 16.392 | 0.000 | 23.851 | 23.881 | 22.563 | 41.277 | 24.682 |
KDR | 21.089 | 19.880 | 17.282 | 23.851 | 0.000 | 19.255 | 20.431 | 41.104 | 20.263 |
LCK | 21.583 | 17.973 | 16.467 | 23.881 | 19.255 | 0.000 | 19.221 | 39.022 | 22.457 |
MET | 20.297 | 18.304 | 17.498 | 22.563 | 20.431 | 19.221 | 0.000 | 39.414 | 21.983 |
p110a | 37.611 | 36.780 | 36.046 | 41.277 | 41.104 | 39.022 | 39.414 | 0.000 | 38.530 |
p38a | 21.731 | 19.481 | 22.128 | 24.682 | 20.263 | 22.457 | 21.983 | 38.530 | 0.000 |
Note: Since this is a distance matrix, lighter colors indicate similarity, darker colors dissimilarity.
Save kinase distance matrix¶
[24]:
kinase_distance_matrix_df.to_csv(DATA / "kinase_distance_matrix.csv")
Discussion¶
In this talktorial, we have assessed kinase similarity using the KiSSim fingerprints, which describe physicochemical and spatial properties of pockets in kinase structures. We have reduced a structure distance matrix to a kinase distance matrix by selecting for each kinase pair the corresponding closest structure pair.
We have to keep two elements in mind:
Pocket fingerprints like KiSSim encode the full pocket although some residues might be more important for ligand binding than others. As an alternative, one could instead use a subset of residues that have been shown to frequently interact with co-crystallized ligands.
We only compare here the two closest structures per kinase pair, although we have — at least for kinases such as EGFR and CDK2 — much more structural data available. Aggregating multiple structures is a reasonable strategy but has two disadvantages:
Some kinases have much higher coverage than others, leading to an imbalance in information content.
Aggregated kinase fingerprints may cause too much averaging, making pairwise kinase comparison noisier.
The kinase distance matrix above will be reloaded in Talktorial T028, where we compare kinase similarities from different perspectives, including the pocket structure perspective we have talked about in this talktorial.
Quiz¶
Can you think of reasons why it is important to include physicochemical and spatial properties in a fingerprint?
Can you think of a reason why the side chain orientation and solvent exposure may be interesting features to consider in a fingerprint?
Can you think of research questions for which you could make use of the structure distance matrix?