.. _programs: Programs ======== Programs summary tables: .. table:: Core |Dock| C++ executables +-------------------+-------------+---------------------------------------------------------------------------------+ | Executable | Used for | Description | +===================+=============+=================================================================================+ | ``rbcavity`` | Preparation | Cavity mapping and preparation of docking site (.as) file. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``rbcalcgrid`` | Preparation | Calculation of vdW grid files (usually called by ``make_grid.csh`` wrapper | | | | script). | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``rbdock`` | Docking | The main |Dock| docking engine itself. | +-------------------+-------------+---------------------------------------------------------------------------------+ .. table:: Auxiliary |Dock| programs +-------------------+-------------+---------------------------------------------------------------------------------+ | Executable | Used for | Description | +===================+=============+=================================================================================+ | ``sdtether`` | Preparation | Prepares a ligand SD file for tethered scaffold docking. Annotates ligand SD | | | | file with tethered substructure atom indices. Requires Open Babel Python | | | | bindings. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``rbhtfinder`` | Preparation | Used to optimise a high-throughput docking protocol from an initial exhaustive | | | | docking of a small representative ligand library. Parametrize a multi-step | | | | protocol for your system. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``make_grid.csh`` | Preparation | Creates the vdW grid files required for grid-based docking protocols | | | | (``dock_grid.prm`` and ``dock_solv_grid.prm``). Simple front-end to | | | | ``rbcalcgrid``. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``rbconvgrid`` | Analysis | Converts |Dock| vdW grids to InsightII grid format for visualisation. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``rbmoegrid`` | Analysis | Converts |Dock| vdW grids to MOE grid format for visualisation. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``rblist`` | Analysis | Outputs miscellaneous information for ligand SD file records. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``sdrmsd`` | Analysis | Calculation of ligand Root Mean Squared Displacement (RMSD) between reference | | | | and docked poses, taking into account ligand topological symmetry. Requires | | | | Open Babel Python bindings. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``sdfilter`` | Analysis | Utility for filtering SD files by arbitrary data field expressions. Useful for | | | | simple post-docking filtering by score components. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``sdsort`` | Analysis | Utility for sorting SD files by arbitrary data field. Useful for simple | | | | post-docking filtering by score components. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``sdreport`` | Analysis | Utility for reporting SD file data field values. Output in tab-delimited or CSV | | | | format. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``sdsplit`` | Utility | Splits an SD file into multiple smaller SD files of fixed number of records. | +-------------------+-------------+---------------------------------------------------------------------------------+ | ``sdmodify`` | Utility | Sets the molecule title line of each SD record equal to a given SD data field. | +-------------------+-------------+---------------------------------------------------------------------------------+ Programs reference ------------------ rbdock ^^^^^^ ``rbdock`` -- The |Dock| docking engine itself. .. code-block:: bash $RBT_ROOT/bin/rbdock {-i input ligand MDL SD file} {-o output MDL SD file} {-r system definition.prm file} {-p docking protocol.prm file} [-n number of docking runs/ligand] [-s random seed] [-T debug trace level] [[-t SCORE.INTER threshold] | [-t filter definition file]] [-ap -an -allH -cont] Simple exhaustive docking """"""""""""""""""""""""" The minimum requirement for rbdock is to specify the input (``-i``) and output (``-o``) ligand SD file names, the system definition .prm file (``-r``) and the docking protocol .prm file (``-p``). This will perform one docking run per ligand record in the input SD file and output all docked ligand poses to the output SD file. Use ``-n`` to increase the number of docking runs per ligand record. High-throughput docking, option 1 """"""""""""""""""""""""""""""""" The ``-t`` and ``-cont`` options can be used to construct high-throughput protocols. If the argument following ``-t`` is numeric it is interpreted as a threshold value for ``SCORE.INTER``, the total intermolecular score between ligand and receptor/solvent. In the absence of -cont, the threshold acts as an early termination filter, and the docking runs for each ligand will be terminated early once the threshold value has been exceeded. Note that the threshold is applied only at the end of each individual docking run, not during the runs themselves. If the ``-cont`` (continue) option is specified as well, the threshold acts as an output pose filter instead of a termination filter. The docking runs for each ligand run to completion as in the exhaustive case, but only the docking poses that exceed the threshold value of ``SCORE.INTER`` are written to the output SD file. High throughput docking, option 2 """"""""""""""""""""""""""""""""" Alternatively, if the argument following ``-t`` is non-numeric it is interpreted as a filter definition file. The filter definition file can be used to define multiple termination filters and multiple output pose filters in a generic way. Any docking score component can be used in the filter definitions. ``run_rbscreen.pl`` generates a filter definition file for multi-stage, high-throughput docking, with progressive score thresholds for early termination of poorly performing ligands. The use of filter definition files is preferred over the more limited ``SCORE.INTER`` filtering described above, whose use is now deprecated. Automated ligand protonation/deprotonation """""""""""""""""""""""""""""""""""""""""" The ``-ap`` option activates the automated protonation of ligand positive ionisable centres, notably amines, guanidines, imidazoles, and amidines. The ``-an`` option activates the automated deprotonation of ligand negative ionisable centres, notably carboxylic acids, phosphates, phosphonates, sulphates, and sulphonates. The precise rules used by |Dock| for protonation and deprotonation are quite crude, and are not user-customisable. Therefore these flags are not recommended for detailed validation experiments, in which care should be taken that the ligand protonation states are set correctly in the input SD file. Note that |Dock| is not capable of converting ionised centres back to the neutral form; these are unidirectional transformations. Control of ligand non-polar hydrogens """"""""""""""""""""""""""""""""""""" By default, |Dock| uses an implicit non-polar hydrogen model for receptor and ligand, and all of the scoring function validation has been performed on this basis. If the ``-allH`` option is not defined (recommended), all explicit non-polar hydrogens encountered in the ligand input SD file are removed, and only the polar hydrogens (bonded to O, N, or S) are retained. If the ``-allH`` option is defined (not recommended), no hydrogens are removed from the ligand. Note that |Dock| is not capable of adding explicit non-polar hydrogens, if none exist. In other words, the ``-allH`` option disables hydrogen removal, it does not activate hydrogen addition. You should always make sure that polar hydrogens are defined explicitly. If the ligand input SD file contains no explicit non-polar hydrogens, the ``-allH`` option has no effect. Receptor protonation is controlled by the system definition prm file. rbcavity ^^^^^^^^ ``rbcavity`` -- Cavity mapping and preparation of docking site (``.as``) file. .. code-block:: bash $RBT_ROOT/bin/rbcavity {-r system definition .prm file} [-ras -was -d -v -s] [-l distance from cavity] [-b border] Exploration of cavity mapping parameters """""""""""""""""""""""""""""""""""""""" ``rbcavity -r .prm file`` You can run rbcavity with just the ``-r`` argument when first preparing a new receptor for docking. This allows you to explore rapidly the impact of the cavity mapping parameters on the generated cavities, whilst avoiding the overhead of actually writing the docking site (``.as``) file to disk. The number of cavities and volume of each cavity are written to standard output. Visualisation of cavities """"""""""""""""""""""""" ``rbcavity -r .prm file -d`` If you have access to InsightII you can use the ``-d`` option to dump the cavity volumes in InsightII grid file format. There is no need to write the docking site (``.as``) file first. The InsightII grid files should be loaded into the reference coordinate space of the receptor and contoured at a contour level of 0.99. Writing the docking site (.as) file """"""""""""""""""""""""""""""""""" ``rbcavity -r .prm file -was`` When you are happy the mapping parameters, use the ``-was`` option to write the docking site (``.as``) file to disk. The docking site file is a binary file that contains the cavity volumes in a compact format, and a pre-calculated cuboid grid extending over the cavities. The grid represents the distance from each point in space to the nearest cavity grid point, and is used by the cavity penalty scoring function. Calculating the distance grid can take a long time (whereas the cavity mapping itself is usually very fast), hence the ``-was`` option should be used sparingly. Analysis of cavity atoms """""""""""""""""""""""" ``rbcavity -r .prm file -ras -l distance`` Use the ``-l`` options to list the receptor atoms within a given distance of any of the cavity volumes, for example to determine which receptor OH/NH3+ groups should be flexible. This option requires access to the pre-calculated distance grid embedded within the docking site (.as) file, and is best used in combination with the ``-ras`` option, which loads a previously generated docking site file. This avoids the time consuming step of generating the cavity distance grid again. If ``-l`` is used without ``-ras``, the cavity distance grid will be calculated on-the-fly each time. Miscellaneous options """"""""""""""""""""" The ``-s`` option writes out various statistics on the cavity and on the receptor atoms in the vicinity of the cavity. These values have been used in genetic programming model building for docking pose false positive removal. The ``-v`` option writes out the receptor coordinates in PSF/CRD format for use by the rDock Viewer (not documented here). Note that the PSF/CRD files are not suitable for simulation purposes, only for visualisation, as the atom types are not set correctly. The ``-b`` option controls the size of the cavity distance grid, and represents the border beyond the actual cavity volumes. It should not be necessary to vary this parameter (default = 8 Å) unless longer-range scoring functions are implemented. rbcalcgrid ^^^^^^^^^^ ``rbcalcgrid`` -- Calculation of vdW grid files (usually called by ``make_grid.csh`` wrapper script). .. code-block:: bash $RBT_ROOT/bin/rbcalcgrid {-r system definition file} {-o output suffix for generated grids} {-p vdW scoring function prm file} [-g grid step] [-b border] Note that, unlike ``rbdock`` and ``rbcavity``, spaces are not tolerated between the command-line options and their corresponding arguments. See ``$RBT_ROOT/bin/make_grid.csh`` for common usage. make_grid.csh ^^^^^^^^^^^^^ Creates vdW grids for all receptor ``.prm`` files listed on command line. Front-end to ``rbcalcgrid``. rbconvgrid ^^^^^^^^^^ rbmoegrid ^^^^^^^^^ ``rbmoegrid`` -- Calculates grids for a given atom type. .. code-block:: bash rbmoegrid -o -r -p [-g -b -t ] -o (.grd is suffiexed) -r - receptor param file (contains active site params) -p - scoring function param file (default calcgrid_vdw.prm) -g - grid step (default = 0.5A) -b - grid border around docking site (default = 1.0A) -t - Tripos atom type (default is C.3) sdrmsd ^^^^^^ ``sdrmsd`` -- calculation of ligand root mean squared displacement (RMSD) between reference and docked poses. It takes into account molecule topological symmetry. Requires Open Babel Python bindings. .. code-block:: bash $RBT_ROOT/bin/sdrmsd [options] {reference SD file} {input SD file} With two arguments """""""""""""""""" ``sdrmsd`` calculates the RMSD between each record in the input SD file and the first record of the reference SD file. If there is a mismatch in the number of atoms, the record is skipped and the RMSD is not calculated. The RMSD is calculated over the heavy (non-hydrogen) atoms only. Results are output to standard output. If some record was skipped, a warning message will be printed to standard error. With fitting """""""""""" A molecular superposition will be done before calculation of the RMSD. The output will specify an RMSD FIT calculation was done. .. code-block:: bash sdrmsd -o output.sdf reference.sdf input.sdf sdrmsd --out=output.sdf reference.sdf input.sdf Output a SD file """""""""""""""" This option will write an output SD file with the input molecules adding an extra RMSD field to the file. If fitting was done, the molecule coordinates will also be fitted to the reference. .. code-block:: bash sdrmsd -o output.sdf reference.sdf input.sdf sdrmsd --out=output.sdf reference.sdf input.sdf sdtether ^^^^^^^^ ``sdtether`` -- Prepares a ligand SD file for tethered scaffold docking. Requires Open Babel Python bindings. Annotates ligand SD file with tethered substructure atom indices. .. code-block:: bash $RBT_ROOT/bin/sdtether {ref. SDfile} {in SDfile} {out SDfile} "{SMARTS query}" ``sdtether`` performs the following actions: * Runs the SMARTS query against the reference SD file to determine the tethered substructure atom indices and coordinates. * If more than one substructure match is retrieved (e.g. due to topological symmetry, or if the query is too simple) all substructure matchs are retained as the reference and all ligands will be tethered according to all possible matches. * Runs the SMARTS query against each record of the input ligand SD file in turn. * For each substructure match, the ligand coordinates are transformed such that the principal axes of the matching substructure coordinates are aligned with the reference substructure coordinates. * In addition, an SD data field is added to the ligand record which lists the atom indices of the substructure match, for later retrieval by |Dock|. * Each transformed ligand is written to the output SD file. * Note that if the SMARTS query returns more than one substructure match for a ligand, that ligand is written multiple times to the output file, once for each match, each of which will be docked independently with different tethering information. sdfilter ^^^^^^^^ ``sdfilter`` -- Post-process an SD file by filtering the records according to data fields or attributes. :: sdfilter -f '$ ' [-s ] [sdFiles] or :: sdfilter -f [-s ] [sdFiles] .. note:: Multiple filters are allowed and are OR'd together. Filters can be provided in a file, one per line. Standard Perl operators should be used. e.g. .. code-block:: perl eq ne lt gt le ge # for strings == != < > <= >= # for numeric ``_REC`` (record #) is provided as a pseudo-data field. If ``-s`` option is used, ``_COUNT`` (#occurrences of DataField) is provided as a pseudo-data field. If SD file list not given, reads from standard input. Output is to standard output. For example, if ``results.sd`` constains multiple ligands each having multiple poses (ordered by score), then running :: sdfilter -f'$_COUNT == 1' results.sd will get you the first entry for each ligand. sdreport ^^^^^^^^ ``sdreport`` -- Produces text summaries of SD records. :: sdreport [-l] [-t []] [-c ] [-id ] [-nh] [-o] [-s] [-sup] [sdFiles] -l (list format) output all data fields for each record as processed -t (tab format) tabulate selected fields for each record as processed -c (csv format) comma delimited output of selected fields for each record as processed -s (summary format) output summary statistics for each unique value of ligand ID -sup (supplier format) tabulate supplier details (from Catalyst) -id data field to use as ligand ID -nh don't output column headings in -t and -c formats -o use old (v3.00) score field names as default columns in -t and -c formats, else use v4.00 field names -norm use normalised score filed names as default columns in -t and -c formats (normalised = score / #ligand heavy atoms) .. note:: If ``-l``, ``-t`` or ``-c`` are combined with ``-s``, the listing/table is output withing each ligand summary. ``-sup`` should not be combined with other options. Default field names for ``-t`` and ``-c`` are RiboDock score field names. Default ID field name is ``Name``. If ``sdFiles`` not given, reads from standard input. Output is to standard output. sdsplit ^^^^^^^ ``sdsplit`` -- Splits SD records into multiple files of equal size. :: sdsplit [-] [-o ] [sdFiles] - record size to split into (default = 1000 records) -o Root name for output files (default = tmp) .. note:: If SD file list not given, reads from standard input. sdsort ^^^^^^ ``sdsort`` -- Sorts SD records by given data field. :: sdsort [-n] [-r] [-f ] [sdFiles] -n numeric sort (default is text sort) -r descending sort (default is ascending sort) -f specifies sort field -s fast mode. Sorts the records for each named compound independently (must be consecutive) -id specifies compound name field (default = 1st title line) .. note:: ``_REC`` (record #) is provided as a pseudo-data field. If SD file list not given, reads from standard input. Output is to standard output. Fast mode can be safely used for partial sorting of huge SD files of raw docking hits without running into memory problems. sdmodify ^^^^^^^^ ``sdmodify`` -- Script to set the first title line equal to a given data field. :: sdmodify -f [sdFiles] .. note:: If ``sdFiles`` not given, reads from standard input. Output is to standard output. rbhtfinder ^^^^^^^^^^ ``rbhtfinder`` -- Script that simulates the result of a high throughput protocol. :: 1st) exhaustive docking of a small representative part of the whole library. 2nd) Store the result of sdreport -t over that exhaustive dock. in file that will be the input of this script. 3rd) rbhtfinder and are the number of steps in stage 1 and in stage 2. If not present, the default values are 5 and 15 and setup the range of thresholds that will be simulated in stage 1. The threshold of stage 2 depends on the value of the threshold of stage 1. An input of -22 -24 will try protocols: 5 -22 15 -27 5 -22 15 -28 5 -22 15 -29 5 -23 15 -28 5 -23 15 -29 5 -23 15 -30 5 -24 15 -29 5 -24 15 -30 5 -24 15 -31 Output of the program is a 7 column values. First column represents the time. This is a percentage of the time it would take to do the docking in exhaustive mode, i.e. docking each ligand 100 times. Anything above 12 is too long. Second column is the first percentage. Percentage of ligands that pass the first stage. Third column is the second percentage. Percentage of ligands that pass the second stage. The four last columns represent the protocol. All the protocols tried are written at the end. The ones for which time is less than 12%, perc1 is less than 30% and perc2 is less than 5% but bigger than 1% will have a series of *** after, to indicate they are good choices WARNING! This is a simulation based in a small set. The numbers are an indication, not factual values. An example file would look like as follows: :: # 3 steps as the running filters (set by the "3" in next line) 3 if - -10 SCORE.INTER 1.0 if - SCORE.NRUNS 9 0.0 -1.0, if - -20 SCORE.INTER 1.0 if - SCORE.NRUNS 14 0.0 -1.0, if - SCORE.NRUNS 49 0.0 -1.0, # 1 writing filter (defined by the "1" in next line) 1 - SCORE.INTER -10, In other (more understandable) words. First, |Dock| runs 3 consecutive steps: 1. Run 10 runs and check if the ``SCORE.INTER`` is lower than -10, if it is the case: 2. Then run 5 more runs (until 15 runs) to see if the ``SCORE.INTER`` reaches -20. If it is the case: 3. Run up to 50 runs to freely sample the different conformations the molecule displays. And, second: For the printing information, only print out all those poses where ``SCORE.INTER`` is better than -10 (for avoiding excessive printing). rblist ^^^^^^ ``rblist`` -- Output interaction center info for ligands in SD file (with optional autoionisation). .. code-block:: bash rblist -i [-o ] [-ap ] [-an] [-allH] -i - input ligand SD file -o - output SD file with descriptors (default = no output) -ap - protonate all neutral amines, guanidines, imidazoles (default = disabled) -an - deprotonate all carboxylic, sulphur and phosphorous acid groups (default = disabled) -allH - read all hydrogens present (default = polar hydrogens only) -tr - rotate all secondary amides to trans (default = leave alone) -l - verbose listing of ligand atoms and rotable bonds (default = compact table format)