Scoring functions¶
Component scoring functions¶
The RxDock master scoring function (
the replacement of the crude steric potentials (
and ) with a true van der Waals potential,the introduction of two generalised terms for all short range attractive (
) and repulsive ( ) polar interactionsthe implementation of a fast weighed solvent accessible surface (WSAS) area solvation term
the addition of explicit dihedral potentials
van der Waals potential¶
We have replaced the
The quadratic potential is used at repulsive energies between
Empirical attractive and repulsive polar potentials¶
We continue to use an empirical Bohm-like potential to score hydrogen-bonding
and other short-range polar interactions. The original RiboDock polar terms
(
Individual interaction scores are the product of simple scaling functions for
geometric variables, formal charges and local neighbour density. The scaling
functions themselves, and the formal charge assignment method, are retained from
RiboDock. Metals are assigned a uniform formal charge of +1. C+ is considered to
be a weak donor in this context and scores are scaled by 50 % relative to
conventional donors by the assignment of
The most notable improvements to RiboDock are that attractive (hydrogen bond and
metal) interactions with ACC_LP and ACC_PLANE acceptors include terms for
Solvation potential¶
The desolvation potential in RxDock combines a weighted solvent-accessible surface area approach [WSAS2001] with a rapid probabilistic approximation to the calculation of solvent-accessible surface areas [RASASA1988] based on pairwise interatomic distances and radii ((14), taking into account (15), (16), (17), (18), (19), and (20)).
The calculation is fast enough therefore to be used in docking. We have redefined the solvation atom types compared to the original work [WSAS2001] and recalibrated the weights against the same training set of experimental solvation free energies in water (395 molecules). The total number of atom types (50, including 6 specifically for ionic groups and metals) is slightly lower than in original work (54). Our atom types reflect the fact that RxDock uses implicit non-polar hydrogens. The majority of types are a combination of hybridisation state and the number of implicit or explicit hydrogens. All solvation parameters are listed in Appendix section (Table 24).
Dihedral potential¶
Dihedral energies are calculated using Tripos 5.2 dihedral parameters for all ligand and site rotatable bonds. Corrections are made to account for the missing contributions from the implicit non-polar hydrogens.
Intermolecular scoring functions under evaluation¶
Training sets¶
We constructed a combined set of protein-ligand and RNA-ligand complexes for training of RxDock. Molecular data files for the protein-ligand complexes were extracted from the downloaded CCDC/Astex cleanlist [ASTEX2007] and used without substantive modification. The only change was to convert ligand MOL2 files to MDL SD format using Corina [CORINA1990], leaving the coordinates and protonation states intact.
Protein MOL2 files were read directly. The ten RNA-ligand NMR structures from the RiboDock validation set were supplemented with five RNA-ligand crystal structures (1f1t, 1f27, 1j7t, 1lc4, 1mwl) prepared in a similar way. All 15 RNA-ligand structures have measured binding affinities.
58 complexes (43 protein-ligand and 15 RNA-ligand) were selected for the initial fitting of component scoring function weights. Protein-ligand structures were chosen (of any X-ray resolution) that had readily available experimental binding affinities [PDBbind2004]. 102 complexes were used for the main validation of native docking accuracy for different scoring function designs, consisting of 87 of the 92 entries in the high resolution (R < 2 Å) clean-list (covalently bound ligands removed – 1aec, 1b59, 1tpp, 1vgc, 4est), and the 15 RNA-ligand complexes.
Scoring functions design¶
Component weights (
Ten intermolecular scoring functions were derived with various combinations of
terms (Table 4). SF0
is a baseline scoring function that has the van der Waals potential only. SF1
adds a simplified polar potential, without f2 (formal charge) and f3 (neighbour
density) scaling functions, and with a single acceptor type (ACC) that lacks
lone-pair directionality. SF2 has the full polar potential (f2 and f3 scaling
functions, ACC, ACC_LP and ACC_PLANE acceptor types) and adds the repulsive
polar potential. SF3 has the same functional form as SF2 but with empirical
weights in regular use at RiboTargets. SF4 replaces the repulsive polar
potential with the WSAS desolvation potential described above. SF5 has the same
functional form as SF4 but with empirical weights in regular use at RiboTargets.
SF6 combines the repulsive polar and desolvation potentials. SF7 has the same
functional form as SF2 and SF3 but with weights for
SF |
Rc |
RMSEc |
|||||||
---|---|---|---|---|---|---|---|---|---|
0 |
1.4 |
- |
- |
- |
- |
0 |
0 |
0.62 |
10.9 |
1 |
1.126 |
2.36 |
- |
- |
- |
0.217 |
0 |
0.64 |
10.2 |
2 |
1.192 |
2.087 |
- |
2.984 |
- |
0 |
0 |
0.63 |
10.4 |
3 |
1.000b |
3.400b |
- |
5.000b |
- |
0 |
0 |
0.59 |
10.9 |
4 |
1.317 |
3.56 |
0.449 |
- |
- |
0 |
4. |
0.67 |
9.6 |
5 |
1.500b |
5.000b |
0.500b |
- |
- |
0.568 |
4.782 |
0.62 |
10.7 |
6 |
1.314 |
4.447 |
0.500b |
5.000b |
- |
0 |
0 |
0.62 |
10.4 |
7 |
1.500b |
5.000b |
- |
5.000b |
- |
0.986 |
12.046 |
0.55 |
12.9 |
8 |
1.000b |
3.400b |
- |
5.000b |
-1.6b |
0 |
0 |
0.53 |
11.8 |
9 |
1.500b |
5.000b |
0.500b |
- |
-1.6b |
0.647 |
5.056 |
0.58 |
11.5 |
Scoring functions validation¶
The ability of the ten intermolecular scoring functions (SF0 to SF9) to
reproduce known ligand binding modes was determined on the combined test set of
102 protein-ligand and RNA-ligand complexes. The intra-ligand scoring function
(
Protein-ligand docking accuracy is remarkably insensitive to scoring function changes. Almost half of the ligand binding modes can be reproduced with a vdW potential only (SF0). The addition of a simplified polar potential (SF1) increases the accuracy to over 70 % of protein-ligand test cases predicted to within 2 Å RMSD. The success rate increases further to 78 % with SF3, which has the full attractive and repulsive polar potentials, and empirically adjusted weights relative to SF2. Subsequent changes to the component terms and weights, including the addition of the desolvation potential, have little or no impact on the protein-ligand RMSD metric.
The nucleic acid set shows a much greater sensitivity to scoring function changes. This can in part be explained by the smaller sample size that amplifies the percentage changes in the RMSD metric, but nevertheless the trends are clear. There is a gradual increase in docking accuracy from SF0 (37 %) to SF3 (52 %), but absolute performance is much lower than for the protein-ligand test set. This level of docking accuracy for nucleic acid-ligand complexes is broadly consistent with the original RiboDock scoring function, despite the fact that the original steric term (LIPO) has been replaced by a true vdW potential. The introduction of the desolvation potential in place of the empirical repulsive polar potential (in SF4 and SF5) results in a substantial improvement in accuracy, to around 70 % of test cases within 2 Å RMSD. Subsequent changes (SF6 to SF9) degrade the accuracy. The lower performance of SF7, which has the higher weights for the VDW and POLAR terms taken from SF5 but lacks the desolvation potential, demonstrates that it is the desolvation term itself that is having the beneficial effect, and not merely the reweighting of the other terms. The inclusion of the geometric aromatic term in SF8 and SF9 has a detrimental impact on the performance of SF3 and SF5 respectively.
Overall, SF5 achieves optimum performance across proteins and nucleic acids
(76.7 % within 2 Å RMSD). SF3 (no desolvation potential) and SF5 (with
desolvation potential) were selected as the best intermolecular scoring
functions. Finally, these two scoring functions, SF3 and SF5, were the ones
implemented in RxDock with the names of dock.prm
and dock_solv.prm
,
respectively.
Note
In virtual screening campaigns, or in experiments where score of different
ligands is compared, the best scoring poses for each molecule (as defined by
the lowest
Code implementation¶
Scoring functions for docking are constructed at run-time (by RbtSFFactory
class) from scoring function definition files (RxDock .prm
format). The
default location for scoring function definition files is
$RBT_ROOT/data/sf/
.
The total score is an aggregate of intermolecular ligand-receptor and
ligand-solvent interactions (branch SCORE.INTER
), intra-ligand interactions
(branch SCORE.INTRA
), intra-receptor, intra-solvent and receptor-solvent
interactions (branch SCORE.SYSTEM
), and external restraint penalties (branch
SCORE.RESTR
).
The SCORE.INTER
, SCORE.INTRA
and SCORE.SYSTEM
branches consist of
weighted sums of interaction terms as shown below. Note that not all terms
appear in all branches. See the rDock draft paper [rDock2014] for more details
on the implementation of these terms.
Term |
Description |
|
|
|
---|---|---|---|---|
VDW |
van der Waals |
|
|
|
VDW |
van der Waals (grid based) |
|
N/A |
N/A |
POLAR |
Attractive polar |
|
|
|
REPUL |
Repulsive polar |
|
|
|
SOLV |
Desolvation |
|
|
|
CONST |
Translation/rotational binding entropy penalty |
|
N/A |
|
ROT |
Torsional binding entropy penalty |
|
N/A |
N/A |
Two intermolecular scoring functions (SCORE.INTER
branch) have been
validated. These are known informally as the standard scoring function and the
desolvation scoring function (referred to as SF3 and SF5 respectively in the
rDock draft paper [rDock2014]). The standard intermolecular scoring function
consists of VDW, POLAR and REPUL terms. In the desolvation scoring function, the
REPUL term is replaced by a more finely parameterised desolvation potential
(SOLV term) based on a weighted solvent-accessible surface (WSAS) area model.
The ligand intramolecular scoring function (SCORE.INTRA
branch) remains
constant in both cases, and has the same terms and weights as the standard
intermolecular scoring function.
File |
Description |
---|---|
|
Intermolecular scoring function definition (standard scoring function, SF3) |
|
As above, but vdW term uses a precalculated grid |
|
Intermolecular scoring function definition (desolvation scoring function, SF5) |
|
vdW term only ( |
|
vdW term only ( |
|
vdW term parameter file |
|
Dihedral term parameter file |
|
Desolvation term parameter file |
Note
External restraint penalty terms are defined by the user in the system
definition .prm
file. Originally, rDock did not support flexible receptor
dihedrals or explicit structural waters, and the overall scoring function
consisted of just the SCORE.INTER
and SCORE.INTRA
branches. At that
time, the intermolecular scoring function definition file (e.g.
RbtInterIdxSF.prm
) represented precisely the SCORE.INTER
terms, and
the intramolecular definition file (RbtIntraSF.prm
) represented precisely
the SCORE.INTRA
terms. With the introduction of receptor flexibility and
explicit structural waters (and hence the need for the SCORE.SYSTEM
branch), the situation is slightly more complex. For implementation reasons,
many of the terms reported under SCORE.SYSTEM
(with the exception of the
dihedral term) are calculated simultaneously with the terms reported under
SCORE.INTER
, and hence their parameterisation is defined implicitly in
the intermolecular scoring function definition file. In contrast, the ligand
intramolecular scoring function terms can be controlled independently.
References¶
Hartshorn, M.J., Verdonk, M.L., Chessari, G., Brewerton, S.C., Mooij, W.T.M., Mortenson, P.N., and Murray, C.W. (2007). Diverse, High-Quality Test Set for the Validation of Protein-Ligand Docking Performance. J. Med. Chem. 50, 726–741. doi:10.1021/jm061277y
Verdonk, M.L., Chessari, G., Cole, J.C., Hartshorn, M.J., Murray, C.W., Nissink, J.W.M., Taylor, R.D., and Taylor, R. (2005). Modeling Water Molecules in Protein-Ligand Docking Using GOLD. J. Med. Chem. 48, 6504–6515. doi:10.1021/jm050543p
Wang, R., Fang, X., Lu, Y., and Wang, S. (2004). The PDBbind Database: Collection of Binding Affinities for Protein-Ligand Complexes with Known Three-Dimensional Structures. J. Med. Chem. 47, 2977–2980. doi:10.1021/jm030580l
Wang, J., Wang, W., Huo, S., Lee, M., and Kollman, P.A. (2001). Solvation Model Based on Weighted Solvent Accessible Surface Area. J. Phys. Chem. B 105, 5055–5067. doi:10.1021/jp0102318
Gasteiger, J., Rudolph, C., and Sadowski, J. (1990). Automatic generation of 3D-atomic coordinates for organic molecules. Tetrahedron Computer Methodology 3, 537–547. 10.1016/0898-5529(90)90156-3
Hasel, W., Hendrickson, T.F., and Still, W.C. (1988). A rapid approximation to the solvent accessible surface areas of atoms. Tetrahedron Computer Methodology 1, 103–116. doi:10.1016/0898-5529(88)90015-2