Exploratory analysis of kinetic solubility measurements of a small molecule library

https://doi.org/10.1016/j.bmc.2011.05.005Get rights and content

Abstract

Kinetic solubility measurements using prototypical assay buffer conditions are presented for a ∼58,000 member library of small molecules. Analyses of the data based upon physical and calculated properties of each individual molecule were performed and resulting trends were considered in the context of commonly held opinions of how physicochemical properties influence aqueous solubility. We further analyze the data using a decision tree model for solubility prediction and via a multi-dimensional assessment of physicochemical relationships to solubility in the context of specific ‘rule-breakers’ relative to common dogma. The role of solubility as a determinant of assay outcome is also considered based upon each compound’s cross-assay activity score for a collection of publicly available screening results. Further, the role of solubility as a governing factor for colloidal aggregation formation within a specified assay setting is examined and considered as a possible cause of a high cross-assay activity score. The results of this solubility profile should aid chemists during library design and optimization efforts and represent a useful training set for computational solubility prediction.

Graphical abstract

Kinetic solubility measurements are presented for a ∼58,000 member library of small molecules and the data is examined in the context of physicochemical properties, assay outcomes and optimization strategies.

  1. Download : Download full-size image

Introduction

Aqueous solubility is a governing property for how small molecules interact with biomolecules (proteins, nucleic acids, etc.) and living systems (cells, tissues and whole organisms). A broader appreciation of aqueous solubility is changing how researchers pursue the drug discovery process from library design to screening to hit optimization. The physicochemical properties of small molecules intended for primary screening and lead development changed greatly as drug discovery approaches evolved in response to the advent of combinatorial chemistry and high-throughput screening (HTS).1, 2, 3, 4, 5 The synthesis and screening of hundreds of thousands of small molecules made it difficult and, to some, unnecessary to pay attention to the physicochemical properties of all library members and, as a result, screening hits became increasingly lipophilic in nature. A growing contingent of researchers began to consider the increasing difficulty in transforming screening hits into clinical candidates by asking fundamental questions about the differences between failed lead compounds and approved drugs. Numerous lessons resulted from these analyses including the set of parameters known as the rule of five first defined by Lipinski.6 In this analysis, Lipinski et al. scrutinized a set of >2000 orally bioavailable clinical agents for physical and calculated properties including molecular weight, H-bond donors, H-bond acceptors and c Log P. The results suggested that the 90th percentile of drug-like compounds with acceptable solubility and permeability possessed a MW under 500, fewer than five H-bond donors and 10 H-bond acceptors and a c Log P value less than five. Subsequent guidelines have been set forth that include other physicochemical descriptors including compound flexibility and polar surface area.2

An additional consequence of the combichem-HTS revolution was the requirement for a carrier solvent for small molecules entering HTS assays. The kinetics of these assays required the rapid aqueous/buffer solvation of each library member and dimethyl sulfoxide (DMSO) was widely adopted as an appropriate solvent for the storage and dispensing of compound libraries into assay wells. This was a sharp departure from experiments that allow an agent to reach aqueous solubility equilibrium over time. The former version of solubility (referred to as kinetic solubility) is considered more appropriate for discovery settings while the latter (referred to as thermodynamic solubility) often takes precedent in formulation and dosing studies.7, 8 The differences associated with kinetic and thermodynamic solubilites are often ignored in discovery settings.

Strategies for optimizing libraries for kinetic solubility have greatly aided both assay performance and the ability of screening leads to enter optimization efforts.9, 10 Many of these strategies derive from a combination of chemist’s intuition and the critical analyses of existing data sets leading to predictive models of solubility. Unfortunately, the ability to gather true aqueous solubility values at a defined pH on large compound collections has been limited by the technologies and cost of acquiring such data and most data sets are restricted to related analogues being evaluated as part of an optimization effort. These data sets are often compromised due to inadequate structural diversity of the compounds, inconsistencies in methods and bias toward expected outcomes. Despite imperfect training sets, numerous computational tools (commercial and noncommercial) for solubility prediction exist to help during library design and optimization efforts.11, 12, 13, 14, 15 Solubility measurements from a large compound collection achieved via a common methodology would be useful in terms of validating current dogma and as a relevant training set for advanced computational models. In 2009, Clark and co-workers reported the kinetic solubilities from a drug-like collection of >700 compounds and provided an analysis of the results in terms of selected physical and calculated descriptors of the library members.16 In 2010, Hill and Young reported an analysis of kinetic solubility of a large compound library (∼100 K) and experimentally derived values of hydrophobicity (Log DpH 7.4) for a subset of this library (∼20 K).17 This report provided enlightening lessons on the relationship between calculated and experimentally determined Log D/Log P values and also explored the impact that aromatic ring content had on solubility.

Here, we describe an exploratory analysis of kinetic solubility measurements for 57,857 compounds of the NIH Molecular Libraries Small Molecule Repository (MLSMR). We related the solubility of this library to specific compound physical characteristics and calculated properties. Further, we examine subsets of this data to help understand compounds that deviate from expected trends and specific ‘rule-breakers’ in order to better advise chemists hoping to optimize agents with undesirable physicochemical properties. We also examine the relationship between solubility and the frequency of reported activities from primary screens with a particular focus on agents that are putative aggregators within a reported β-lactamase screen.18 Importantly, the results from this study are publically available through the PubChem database to allow researchers access to this valuable data set (http://pubchem.ncbi.nlm.nih.gov/).

Section snippets

Method

Solubility measurements were accomplished from stock 10 mM DMSO solutions (6 μL) dispensed into PBS buffer (294 μL, pH 7.4) via Chemiluminescent Nitrogen Detection (CLND).19 The equimolar nitrogen response of the detector was calibrated using TRIZMA base at 28 concentrations spanning the dynamic range of the instrument from 0.08 to 4500 μg/mL nitrogen and the measured solubility values were corrected for background nitrogen. On board performance indicating standards (Imipramine HCl, Sulfamethizole

Results and discussion

Figure 1 summarizes the distribution of measured solubility values and the breakdown of the dataset by the solubility classes ranging from <5, 5–10, 10–15 μg/mL, and so on [all data groupings are closed to the lower number (i.e., 5 μg/mL to the lowest value below 10 μg/mL)]. Based upon the PubChem definitions, the majority of compounds are moderately soluble (39,301 or 67.9%) followed by a large percentage of low soluble compounds (17,574 or 30.4%). The high soluble group constitutes just 1.7% of

Conclusion

In this study we have examined the kinetic solubility for a large chemical library and examined the data in a variety of ways. Foremost, commonly used physicochemical properties were correlated to solubility outcomes and generally confirmed the existing dogma in terms of solubility relationships. However, the data also made it clear that chemists cannot rely solely on the predictive capacity of physicochemical trends to assure that their libraries and optimization efforts result in soluble

Acknowledgments

The authors would like to thank Dr. Douglas Livingston, Dr. Timothy Lease and the staff at Biofocus DPI for their assistance with compound management. We further acknowledge Dr. Jamie Driscoll for support of this effort. We thank Dr. Ed Kearns for helpful discussion during the writing of this manuscript. This research was supported by the Molecular Libraries Initiative of the National Institutes of Health Roadmap for Medical Research grants U54 HG005033-02 to G.P.R. and the Intramural Research

References and notes (29)

  • S. Stegemann et al.

    Eur. J. Pharm. Sci.

    (2007)
  • C.A. Lipinski et al.

    Adv. Drug Delivery Rev.

    (2001)
  • J. Alsenz et al.

    Adv. Drug Delivery Rev.

    (2007)
  • L. Di et al.

    Drug Discovery Today

    (2006)
  • A.P. Hill et al.

    Drug Discovery Today

    (2010)
  • S.N. Bhattachar et al.

    J. Pharm. Biomed.

    (2006)
  • G.M. Rishton

    Drug Discovery Today

    (2003)
  • C. Lipinski

    Am. Pharm. Rev.

    (2002)
  • P.D. Leeson et al.

    Nat. Rev. Drug Discovery

    (2007)
  • J.P. Kennedy et al.

    J. Comb. Chem.

    (2008)
  • J. Inglese et al.

    Nat. Chem. Biol.

    (2007)
  • E.H. Kerns et al.

    Drug-like properties: Concepts, Structure Design and Methods

    (2008)
  • C. Saal

    Am. Pharm. Rev

    (2010)
  • J.S. Delaney

    J. Chem. Inf. Comput. Sci.

    (2004)
  • Cited by (22)

    • Application of HPLC measurements for the determination of physicochemical and biomimetic properties to model in vivo drug distribution in support of early drug discovery

      2020, Handbook of Analytical Separations
      Citation Excerpt :

      Nowadays it is common practice in pharmaceutical research to determine a compound's solubility at a much earlier stage of the development process. The poor solubility of molecules can influence the results of high-throughput enzyme assays giving false-negative activity values if the compound has precipitated out of the solution [115–120]. Knowing the solubility of compounds during the lead optimization process can significantly reduce later costs in drug development for the formulation and absorption of poorly soluble compounds.

    • Brief overview of solubility methods: Recent trends in equilibrium solubility measurement and predictive models

      2018, Drug Discovery Today: Technologies
      Citation Excerpt :

      Moreover, the presence of cosolvent (typically 1–5% DMSO) leads to the overestimation of the “true” solubility values [20], and the vast diversity in the assay setups (buffer, cosolvent content, incubation time, method of phase separation, readout, etc.) dictated by project needs or corporate culture makes kinetic solubility data incomparable. However, in diagnosing false positive HTS hits, so-called “frequent hitters” [21], or in rigorous bioassay optimization [4], the relevance of such customized solubility information is unequivocal. In contrast, low or medium throughput (LT or MT) thermodynamic solubility assays have been traditionally applied in the late discovery phase for the characterization of advanced compounds.

    • Selection of effective cocrystals former for dissolution rate improvement of active pharmaceutical ingredients based on lipoaffinity index

      2017, European Journal of Pharmaceutical Sciences
      Citation Excerpt :

      Many examples of such cases can be found in the literature, including cocrystals but also other types of solid mixtures such as inclusion complexes with cyclodextrins (Dua et al., 2011; M Badr-Eldin et al., 2013; Thiry et al., 2017a, 2017b) and dispersions with soluble polymers (Brough et al., 2015; Choi and Park, 2017; Ibrahim et al., 2010; Maggi et al., 2013; Six et al., 2004; Thiry et al., 2017a). Noteworthy, Guha et al. (2011), found that APIs can be classified in terms of their kinetic solubility values using different hydrophilicity indices. Hence, all three molecular descriptors identified during screening procedure are related to differences in the polarity of cocrystals components.

    • A high throughput solubility assay for drug discovery using microscale shake-flask and rapid UHPLC-UV-CLND quantification

      2016, Journal of Pharmaceutical and Biomedical Analysis
      Citation Excerpt :

      Chemiluminescence nitrogen detection (CLND) is a universal detector that is well suited for quantitation without the need for compound specific calibration curves. CLND has been widely used in combinatorial chemistry [22], pharmaceutical analysis [23–26], biological extract analysis [9], peptide analysis [27], and kinetic solubility assays [28]. CLND is a mass-sensitive detector and works chromatographically by nebulizing HPLC column effluents.

    View all citing articles on Scopus
    View full text