Structure Based Virtual Screening Resources

Harmine.png    Our research lab uses a number of programs to perform in silico screening assays (the testing of molecules with a computer program).  To do this, we first build a model of a target protein using either known crystal structures from NCBI or using homology models.  We then create a virtual library of molecules from either publicly available databases (such as the ZINC database) or using in-house software such as Spartan.  Although it is possible to test hundreds of millions of compounds, we typically try to screen less than 10 million compounds in order to keep the processing time reasonable.  We generally filter the large databases to more reasonable numbers by selecting for smaller molecules and those with good ADMET properties.   The programs we usually use to preform these include Autodock, Vina, and FRED.  Once the jobs are processed, the programs will indicate which molecules are suspected to be potent.  We then will synthesize the compounds and test them in a living system to see if the virtual screen results are correct.  When possible, we try to test several programs with real biological data prior to beginning a virtual screen to minimize false hits.

Collaborations

    If you are looking for someone to run a structure-based virtual screen, we're always interested in collaborations!  We specialize in testing for commercially available compounds that can be purchased and immediately tested in cellular assays, but we are able to run other types of screens as well.  A typical docking library is 7-10 million compounds, and our libraries can be tailored for a specific range of properties (molecular weight, polar surface area, log P, etc.).  We do not do molecular dynamics or ligand-based screens, sadly.

Ligand Libraries


    Our group maintains a collection of several in-house virtual libraries as PDBQT files.  Our most commonly used libraries feature a collection of all drug-like, commercially available molecules (updated annually).  If you would like a copy of our virtual libraries, please contact us.

Software

We use the following support software most commonly:
  • Open Babel
  • Autodock Vina
  • Smina
  • iDock
  • ChimeraX
  • Cygwin (for a Windows machine)
  • Notepad++
  • ChemSketch
  • ACD NMR Processor
With the exception of the ACD NMR processor, all of this is available as free open-source software.  Our lab is a big supporter of open-source software, and we gratefully acknowledge the developers of these programs.  Most of these programs also have excellent web resources for beginners.  Our lab would also like to thank ACD labs for their generous donation of the NMR processing software we use.

Preparing the Protein Target

     Our protocol for preparing the protein target can be found in a PDF document with step-by-step instructions hereAlternatively, you can use the older python script from AutoDock, which you can find info about here.

iDock Virtual Screening

     We find iDock to be the fastest free program out there and highly recommend it for a quick job.  It's also a well-written program that can get maximum usage out of a system.  iDock can be downloaded here.  To use run iDock, you'll need to create a directory with the following items in it:

  1. A PDBQT file of the protein target named foo_DOCK.pdbqt (where "foo" is usually the PDB code)
  2. The iDock program
  3. A directory named Ligands that contains all of the input ligand files in PDBQT format.  All input ligands must have a name beginning with the word "ligand" like this:  ligand02_1554.pdbqt.  These ligands also all need to have a "Name" in the file.  This is most commonly a ZINC ID code, or something similar.  Our script will look for this to help reference the compounds later.
  4. A proper configuration file called idock.conf with the ligand folder, protein target, and coordinates set up.  In our example, we have placed all ligand files in a directory called "Ligands," and our protein target is called 2vt4_DOCK.pdbqt.
    • The configuration file above is set to use 6 cores and with default values.  Time per ligand can vary a lot based on system resources and the size of input files, but iDock is very fast and most ligands will be processed in less than a minute.  We have found little improvement by increasing the default values based on a benchmarking study we did here.
  5. The screen_idock_v4.2.sh script for a Linux system or the screen_idock_Win_v4.2.sh script for Windows.


Once everything is ready, simply execute the script in the background and let the job run.

$ nohup ./screen_idock_v4.2.sh &
Once the script is running, it will create a new directory called Results, then it will extract the "Name" found in all input ligand files and compile them into a file called IDs.txt.  It will then begin docking all ligands in the Ligands folder.  Once all ligands have been docked, it will then extract all the results, couple the ID for each ligand to its best binding score, then sort the results into a final summary file called Summary_Final.txt.  You can convert the output PDBQT files into PDB files using Open Babel.

Vina Virtual Screening

     Vina is a slightly older program and somewhat slower than iDock, but it's tried and true and available for Linux, PC and Mac.  We usually only use Vina if it gives results that most closely match a known set of benchmarks.  Vina can be downloaded here

     We have designed our scripts so that you can use the same type of input files and directory structure, so you'll notice that the format is very similar to our iDock procedure above.  To run a virtual screen with our Vina script, you'll need to have the following things in the same directory:

  1. A PDBQT file of the protein target named foo_DOCK.pdbqt
  2. The Vina program (which is contained in a directory called Vina)
  3. A directory named Ligands that contains all of the input ligand files in PDBQT format.  All input ligands must be named with the word "ligand" in front as with the iDock screen (e.g. ligand02_1554.pdbqt).
  4. A properly formatted conf.txt file for Vina to read.  You can see an example here where the "#" entries need to be filled in (see the Docking the Classroom page for a full guide).
    • Running timing tests are a good idea.  Generally, 6-12 CPUs are best depending on your setup.  Much larger numbers can actually make the program slower.
    • We find the default settings with 10 conformations works well (see our example file above).
    • As mentioned before, we don't see much improvement in docking accuracy when we increase the exhaustiveness setting, so we leave it at the default.
  5. The screen_Vina_v4.2.sh script for a Linux system, or the screen_Vina_Win_v4.2.sh script for a Windows machine.

Once you're ready, just execute the script in the background with:

$ nohup ./screen_Vina_v4.2.sh &
Note that if you get a "permission denied" message (which sometimes appears as a bad input message on a PC or a cowardly statement claiming it can't find the conf.txt), just use the chmod command and try again:

$ chmod -R 775 ./
$ nohup ./screen_Vina_v4.2.sh &
Once the script begins running, it will grab the "Name" entry from each ligand file as before and place the compiled names in the IDs.txt file in a new directory called Results (same as iDock).  It will then go through and check all input PDBQT files for problem lines that can cause Vina to fail.  This may take a few minutes.  The script will then dock each ligand in the Ligands file.  Once done, it will again extract the best energies, rank them, and match them by "Name" in the IDs.txt file.  The final output file will be called Summary_Final.txt.  Output PDBQT files can be converted to PDB files using Open Babel.

    You can find a detailed procedure on how to run a virtual screen with our scripts by looking at the tutorials found on our Docking in the Classroom page.

Smina Virtual Screening

     Smina is a fork of Vina that can be found here.  If you have no benchmarks to test different programs, we have found that Smina works the best based on a benchmarking study with the DUD-E database (link).  Sadly, Smina is only available for Linux systems.  If you want to run a Smina screen, then you can use our screen_Smina_v4.2.sh script.  Your input directory should be configured exactly the same way as with Vina (with identical configuration file format), with the only difference being that you need to have the smina.static program instead of the Vina program.

AutoDock Virtual Screening

    We no longer use AutoDock, as it's very slow and cannot thread.  Recent papers seem to make convincing cases that it gives inferior performance to boot.  Not a bad program, but better things are out there.


Virtual Screening in the Classroom

    If you are a professor and are interested in teaching virtual screening, it can be done!  See our Docking in the Classroom page for a full tutorial.


Ligand Based Screening Resources

     We normally only target proteins with known structures, so we have little experience with ligand based screening techniques.  However, a number of open-source programs, including installation and usage instructions, can be found on the Insilab website here.  Thanks to Praveen Kumar M for the suggested link.

MPiDock

     The MPiDock program is a program designed to run on networked terminals with iDock and Vina.  At the outermost level, it consists of a scheduler, written as a command line interface for crontab, which then calls the batcher, which is a program that divides up input files and executes commands remotely with those input files, allowing parallelization across all terminals.  The program is freely available on GitHub here.  MPiDock was developed at Cal Poly by Dennis Li, Derek Nola and Lucy Bowen (pictured below, left to right)
MPiDock_group.JPG