High-throughput screening (HTS) is becoming a routine method for identifying probes used in chemical biology. The flexibility of the process has allowed numerous and disparate areas of biology to engage with an equally diverse palate of chemistry. However, with this diversity, the many components associated with HTS—the bioassay, the assay format, the nature of the library, reagent and sample delivery methods, the detection instrumentation, the level of automation, and the data analysis algorithms—all conspire to make HTS one of the least standardized processes used in academia today. Nonetheless, there exists commonality in HTS that allows reporting of data describing fundamental parameters of the assay, screen, library, and outcome that are useful for gaining insight into these HTS processes and for comparing results between screens.

Here we suggest data and descriptive information to include in HTS protocols in manuscripts and databases that should aid in providing a basis for evaluation, comparison and replication of small-molecule screens. We have divided small-molecule screen protocol information into five categories: the assay, the library, the HTS process, the post-HTS analysis of data and compound structures, and the screen results (Fig. 1). Within each category, we describe key pieces of information that are important for interpretation and replication of a screen (Table 1).

Figure 1
figure 1

The flow of materials and data from assay to reported results in HTS.

Table 1 Reporting parameters for small-molecule screening data

Assay

Assays fall into three general types: isolated molecular target assays, cell-free multicomponent assays, and cell- or organism-based assays. Assays on purified enzymes such as proteases or kinases, and assays on activities associated with cell extracts, membranes or reconstituted signaling cascades, are representative examples of the first two assay types. Cellular assays can be subdivided into 'reporter gene'–type assays and phenotypic assays that measure outputs resulting from intact cellular processes.

A description of the logic behind the assay, including positive and negative control conditions, is critical to understanding how library compounds will be scored as active in the assay. This description provides context for the assay's sensitivity to specific types of interference. Positive controls are conditions (for example, small-molecule addition, RNA interference knockdown, or mutations) that produce the same result in the assay as a desired active compound. Negative controls are usually 'vehicle'-only conditions (for example, DMSO) or, where appropriate, small molecules demonstrated to have no activity in the assay. Controls are used to determine an assay 'window' and validate the biological response. It is often desirable to provide an indication as to the efficacy of the controls. For example, here is an assay strategy description that might be written for a fluorescence polarization assay:

This screen was carried out to identify compounds that disrupt formation of the X–Y protein complex. The assay uses fluorescence polarization to monitor binding of a fluorescein-labeled peptide, X36* (derived from the protein X extracellular domain), to full-length protein Y. Screening positives are compounds that block binding of X36* to protein Y. Addition of unlabeled X36 peptide (5 μM) completely blocks binding of X36* in this assay and is used as a positive control. DMSO alone is the best negative control for this assay during HTS.

For some assays (in particular for cell-free assays), it is possible to get quantitative measurements of the dynamic range and sensitivity of the assay. When available, this information is useful to help evaluate the quality of an assay. For example, this information might be presented for a fluorescence polarization assay as “This assay was linear in the range of 60 mP units (the polarization observed for free peptide) to 170 mP units (the polarization observed for fully bound peptide) with an error of ± 5 mP units. Under screening conditions, we estimate that differences of 5 to 10% of peptide bound could be reliably distinguished.”

The sources of all reagents used in the screen should be documented. Catalog and batch numbers (if relevant) for all commercially available components should be listed. Descriptions of how all noncommercial reagents were generated or obtained should be provided. Availability of proprietary reagents should be described. It is helpful if the amounts of key assay reagents required to screen a library of a defined size in a specified assay format are mentioned explicitly.

A clear summary of the assay protocol should be provided for each screen. The instruments used to perform each step should be indicated. This can be written in paragraph form; for example, “Assay plates were filled with 1,000 HeLa cells per well in 30 μl of medium (Matrix Wellmate plate filler) and incubated in a Liconic STX200 incubator for 16 hours before compound addition....” Presenting HTS assay protocols in table form is also a good option, especially for more complex protocols. Table 2 shows an example adapted from Davis et al.1.

Table 2 Example HTS assay protocol table

Library

In order to allow others to evaluate the nature of the results and to replicate work in the future, it is essential to provide sufficient knowledge regarding the nature of the library that has been screened. First, the constituency of the library should be described (for instance, peptide, natural product, known drugs), with corresponding detail provided about the type of compounds (for instance, core scaffolds or privileged structures) represented. An indication of the size or number of members in the library and how samples are presented to the assay (for instance, arrayed individually or pooled) should be provided. If the library was screened as mixtures then a description of the procedures needed to identify the active compound from the mixture (deconvolution) should be provided. For natural products, the nature of the samples should be described (that is, are the samples purified and structures determined, or are extracts being used?). Finally, information should be provided about the quality control procedures used to acquire and maintain the library, and about the source of the library. If available a web link can be included providing additional library information. When files detailing the composition of a library are provided, where possible they should include structure and unique/vendor ID information. Example:

The library screened consisted of 50,000 compounds arrayed in 384-well plates as single compounds at 10 mM in DMSO (additional detail describing the library may be obtained from the links included in the supplementary information). The quality of all compounds was assured by the vendor as greater than 90% pure, with provided quality control data; this was verified internally on 5% random sampling. The library was screened at a constant 1:1,000 dilution, with a 10-μM final concentration of compound in each well (0.1% DMSO). Supplementary information: http://www.msdiscovery.com/spectrum.html for library .xls or .sdf files.

HTS process

Though most HTS laboratories use microtiter plate–based platforms, the following descriptive information should serve as a general guide for nonplate-based processes as well. Assay controls are critical to evaluation of assay response and to the performance of any assay, and for microtiter plate assays they are often arranged as follows: interplate controls are used to assess and correct, when possible, systematic variations in the biological response over time (for example, slow clogging of a dispenser tip). Intraplate controls are essential to the establishment of the assay window (as described above), and over the course of a screen they permit the analysis of the uniformity of the biological response. The following two examples from Davis et al.1 illustrate how this information can be described:

Interplate controls: plates containing vehicle only (in place of test compounds) were uniformly distributed throughout the screen at ten-plate intervals to monitor systematic variation in background.

Intraplate controls: columns 1–4 of the 1,536-well plates were used for arraying of controls. Columns 1 and 2 contained a 16-point dose-response curve of MG132, with each concentration present in duplicate. In column 3, the top 24 wells contained doxycycline only (for use as normalization to the minimum signal), and the bottom 8 wells contained assay medium alone for use as a reference for induction by doxycycline. In column 4, the top 24 wells contained the highest concentration of MG132 (for use as a 100% activation reference), and the bottom 8 wells again contained assay medium alone.

The number of assay plates and screen duration are useful metrics that should be incorporated into a description of the HTS. Reagent and compound dispensing systems, detectors (type, model, settings), and output mode (end point, kinetic, and so on) should be adequately detailed. Availability of custom detectors or data handling methods should be included, as well as critical details to optimize output reads, such as spectral overlap 'spillover corrections' for multifluorophore2 or multireporter gene–based assays1. If these instruments are peripheral devices on a robotic platform, additional details regarding the nature of the software controlling the system should be included. Two examples:

Seventy-eight 1,536-well plates representing a library of 100,000 compounds and interplate controls (7 plates) were screened over an 8-h period. Cells were dispensed into compound-containing 1,536-well microtiter plates previously loaded using an Echo555 liquid handler (Labcyte, Inc.), with a single-channel BioRAPTR FRD (Beckman Coulter, Inc.). Microtiter plates were read on an EnVision Multilabel Plate Reader (Perkin Elmer, Inc.) with 405 nm excitation, dual emission at 460 and 530 nm using a bottom read.

Target enzyme was added to 384-well plates using a Freedom EVO75 (Tecan), followed by addition of compound with a 384-hydrophobic-coated FP3S100 pin tool array (V&P Scientific).

Correction and normalization procedures describe the methods by which screening results were corrected for systematic error, as, for example, discussed by Gunter et al.3 using the B-score method, and normalized to controls. Typically data normalization is expressed as either 'percentage inhibition', 'percentage activity', or potency values (for example, EC50, IC50, AC50) if multiple concentrations are tested in the screen (for example, % inhibition = 100 × (corrected sample result − average of positive control)/(average of negative control − average of positive control)).

Performance of the assay during the screen is commonly calculated using the Z factor, a statistical parameter that takes into account the signal to background and assay signal variation4. Plate-based Z factor determinations provide a measure of the screen performance. Both control (Z′) and sample (Z) factors can be reported. The minimum significance ratio (MSR) can be reported for in-plate reference titrations5.

Post-HTS analysis

After HTS, the primary screening data are analyzed and follow-up assays are carried out to confirm the activity of compounds that score as 'positives' in the primary screen. Chemical structures of active compounds should be verified before being reported. Addressing the following questions should provide the audience with an understanding of how the investigator arrived at the compounds of interest: how were compounds selected as active in the primary screen? How were the initial active compounds retested to confirm activity? How was compound chemical structure confirmed? Have active compounds been further purified or resynthesized? Two examples:

Data were normalized as percentage activity relative to positive control. Active compounds were defined as those in the 99.5th percentile. Individual samples of actives were rearrayed in 384-well plates from separately maintained master samples and resubjected to the original screening assay, at 10 μM fixed concentration, with triplicate sampling for each active. Those giving a reproducible (2 of 3 or 3 of 3) activity were used to produce 10-point dose-response curves ranging from 30 μM to 1 nM, with triplicate sampling. Active compounds of interest for further study were defined as those with a reproducible EC50 of less than 1 μM. These compounds were subjected to further secondary screening including the use of a second biochemical test of activity, a cellular measure of pathway activity, cytotoxicity measurements using four human cell lines, and solubility and permeability measurements.

All compounds of interest were subjected to LC/MS/ELSD analysis using a sample from the original screening stock and a sample from the medium remaining after completion of the dose-response study (at the highest concentration). Each compound used for subsequent studies was either repurchased from the original vendor and purified in-house, or synthesized and purified in-house.

Results

Ranking of primary screening actives and of confirmed active compounds can serve as a useful means to describe the outcome of the HTS and analysis, and it is further enhanced with the inclusion of comments on factors that disqualified initial actives and on ranking strategies. Example:

The structures of all primary HTS and confirmed active compounds are included in the supplementary materials. Compounds considered for additional study, ranked by percentage activity, are shown in Figure 1. Confirmed active compounds are defined as those with a potency that is equal (within five-fold) in both biochemical assays, a similar (within ten-fold) potency in the cellular assay, a minimal dosage window (20-fold) between the cellular activity assay and any single cytotoxicity assay, solubility of greater than 10 μM in water (pH 7.4), and permeability in the PAMPA model (pH 7.4 to 7.4) of greater than 1,000 × 106 cm s−1.

Conclusions

In this commentary, we have proposed an initial set of guidelines for reporting small-molecule HTS data, which is summarized in Table 1. The development of a standardized protocol to describe small-molecule screening projects would aid in the unambiguous interpretation of published results and facilitate transfer of screening data between databases. The screening and data standards communities should develop these guidelines more fully, based on the frameworks of existing minimum information guidelines efforts6 (http://mibbi.sourceforge.net/), and to the level of detail of the minimum information about a cellular assay (MIACA) data model7 (http://miaca.sourceforge.net/) and related standards that describe biological experiments.