1 Introduction

QM/MM-MD [1, 4, 5, 14, 15, 29, 37, 43, 44] simulations are of ever-growing interest for the investigation of metal-ion complexes [19, 28, 30, 38, 41] and biomolecules [36], such as proteins and ribonucleic acids, because of their methodological flexibility, the capacity to investigate sub-picosecond events as well as the possibility to predict inter- or intramolecular interactions and even chemical reactions. Besides the higher accuracy, the main advantage of hybrid QM/MM-MD to classical MM-MD simulations that utilize popular force fields like AMBER [9], CHARMM [8], OPLS [23] or GROMOS [35] is the ability to describe effects stemming from the intra- or intermolecular shifts of electron density for any desired chemical system. Due to this, phenomena such as charge transfers, the cleavage or formation of covalent bonds and polarization effects can be studied in-depth. Additionally, the influence of different spin states, Jahn Teller distortion and even electronic excitations can be investigated without having to resort to specifically tailored force fields.

One of the major challenges for that kind of investigation is a proper description of the interface between the QM and the MM region, especially if covalent bonds are reaching through the boundary between the two regions [27]. There exist a variety of approaches to describe that frontier bond, for example capping potentials [22, 26], generalized hybrid orbitals [16] or localized self-consisted field [3, 33, 40], but the simplest, yet very accurate [2] and therefore also highly popular [27, 36] method is hydrogen capping [39].

A big question regarding hydrogen-capping approaches is the proper placement of the inserted atom, so that its influence on the behavior of the surrounding atoms is minimized as much as possible. In a previous work [18], this was discussed in detail for amino acids, with the recommendation to use separate parameters for each type of sidechain to considerably improve the description of the link bond. In this work, the focus lies on the QM/MM separation of DNA nucleosides, with the C–N bond between the deoxyribose and the nucleobase acting as the link bond. The investigated QM methods were RI-MP2 [31], B3LYP [6], B3LYP-D3 [17], BLYP [7] and BLYP-D3, as well as the resolution of identity (RI) [34] variants of the density functional theory [20, 25, 32] (DFT) calculations. All computations were performed, utilizing a triple-zeta basis set [12, 45].

2 Methodology

The applied link atom scheme is entirely analog to the one used in the investigation of the amino acids [18]: the link- or capping atom (\(L_{\rm{C}}\)) is positioned on the vector connecting the frontier QM atom (\(L_{\rm{Q}}\)) and the frontier MM atom (\(L_{\rm{M}}\)). Instead of a fixed distance for the \(L_{\rm{Q}}\)-\(L_{\rm{C}}\) bond, a distance ratio \(\rho\) is introduced, defining the position of \(L_{\rm{C}}\) (\({{\mathbf {r}}}_{L_{\rm{C}}}\)) as:

$$\begin{aligned} {{\mathbf {r}}}_{L_{\rm{C}}} = \rho \cdot ({{\mathbf {r}}}_{L_{\rm{M}}}-{{\mathbf {r}}}_{L_{\rm{Q}}}) + {{\mathbf {r}}}_{L_{\rm{Q}}} \end{aligned}$$
(1)

with \({{\mathbf {r}}}_{L_{\rm{Q}}}\) being the position of \(L_{\rm{Q}}\) and \({{\mathbf {r}}}_{L_{\rm{M}}}-{{\mathbf {r}}}_{L_{\rm{Q}}}\) corresponding to the distance between the frontier atoms, respectively. To identify the best possible ratio (i.e., the one where the energy minima of the link bond and the full QM bond coincide), model calculations of the nucleosides deoxyadenosine (A), deoxyguanosin (G), deoxythymidine (T) and deoxycytidine (C) have been conducted. All model systems were set up by initial full QM geometry optimization utilizing TURBOMOLE 6.5 [42]. Solvation effects were approximated implicitly using the COSMO [24] model. Atomic orbitals of H, C, N and O were described by the cc-pVTZ [12, 45] triple-zeta basis set and the D3 method was applied, according to Grimme et al. [17].

In the QM/MM calculations, all nucleobases were computed via one of the QM methods, while deoxyribose was described by the AMBER-99SB [21] force field (see Fig. 1).

Fig. 1
figure 1

Deoxyadenosine model system, used for the parametrization. The blue adenine base is described by one of the QM methods, while all atoms depicted in red are computed by the force field. Both regions are connected via the link bond (green)

The system’s total energy is evaluated for the minimum distance \(r_{\rm{0}}\) of the \(L_Q-L_{\rm{M}}\) bond, obtained from the initial geometry optimization, as well as for a set of slightly higher and lower distances and several different values of \(\rho\). The resulting energy—distance graphs are compared to that of a full QM description of the same system (see Fig. 2a). Once the ideal value for \(\rho\) is identified, the zero points of both descriptions are aligned to one another by subtraction of the respective energy minimum (\(E_{\rm{{min}}}\)). The remaining difference can now be obtained by subtracting the QM/MM energy (\(E_{{\rm{QM/MM}}}\)) from the one of the full QM calculation (\(E_{{\rm{QM}}}\)) and described by a harmonic potential of the form:

$$\begin{aligned} V = \dfrac{k_{\rm{L}}}{2} \cdot (|{{\mathbf {r}}}_{L_{\rm{Q}}}-{{\mathbf {r}}}_{L_{\rm{M}}}| - r_{{\rm{0}}})^2 \end{aligned}$$
(2)

with V representing the potential energy, \({{\mathbf {r}}}_{L_{\rm{Q}}}-{{\mathbf {r}}}_{L_{\rm{M}}}\) the distance between the two atoms that define the bond and \(r_{\rm{{0}}}, k_L\) the equilibrium distance and the force constant, respectively. \(r_{\rm{0}}\) and \(k_{\rm{L}}\) can be obtained by fitting the harmonic potential to the energy difference, considering the three lowest points. The resulting harmonic potential can now be added to the overall QM/MM description of the link bond as a classical bond between \(L_{\rm{Q}}\) and \(L_{\rm{M}}\). Hence, the link bond is defined by a parameter set \(\rho , r_{\rm{0}}, k_{\rm{L}}\) and, provided proper parameters are used, accurately mimics the potential energy well of a full QM description of the host bond—not only for the equilibrium position but also for configurations where the bond length is deviating from \(r_{\rm{0}}\).

Fig. 2
figure 2

a Graphical depiction of the parametrization process for the RI-MP2 representation of deoxyadenosine. The black line represents the potential energy well of the full QM description, whereas the red, dashed line stems from a QM/MM energy scan employing the ideal distance ratio of 0.6978 and the green line shows the energy difference between the two. If the harmonic potential (blue) that is fitted to the energy difference is added to the overall description of the link bond via a force field parameter, the resulting link bond description (orange, dashed line) exactly mimics the full QM case. b Comparison of a QM/MM scan, utilizing the ideal distance ratio (red, dashed) to scans with \(\rho\), deviating by −0.01 (green, dashed) or +0.01 (blue, dashed). The full QM scan is again represented by a solid black line

As Fig. 2b shows, varying the distance ratio by a little margin, such as \(\pm\)0.01, results in bond lengths deviating by \(\pm\)0.02 Å—which is the same error as one would introduce by switching from RI-MP2 to BLYP. The obtained graphs also suggest that at the full QM minimum, the potential well for \(\rho\) = \(\rho(ideal)\pm\)0.01 is still very much repulsive or attractive with a remaining potential energy of \({\sim}\) 0.1 kcal/mol.

The description of the polar C–N bond by N–H also retains the direction of the resulting dipole, since the electronegativity of hydrogen is comparable to that of carbon and thus also lower than the electronegativity of nitrogen.

3 Results and discussion

The parametrization process has been performed following the scheme outlined in Ref. [18] and Sect. 2. The resulting distance ratios \(\rho\) (depicted in Fig. 3; Table 1) indicate that there is (1) a large gap between the results obtained for MP2 and DFT and (2) a considerable difference between the pyrimidine based nucleosides and the purine ones. The data also suggests that the structural differences between the two pyrimidines, cytosine and thymine, are so small that the resulting ideal distance ratios are very similar to each other. The two investigated purine nucleobases, adenine and guanine, show analogous behavior.

Fig. 3
figure 3

Graphical representation of the obtained distance ratios for all investigated deoxynucleosides. The color coding of the data groups labeled A, G, C and T indicates the different QM methods, investigated in this work: blue RI-MP2, orange B3LYP, yellow B3LYP-D3, green BLYP and brown BLYP-D3. Also depicted are the recommended average values for pyrimidines (py) and purines (pu), respectively

Table 1 Overview over the obtained distance ratios \(\rho\) for all investigated QM methods

In consequence, it seems necessary to use separate distance ratios for purine and pyrimidine bases and for each investigated QM method, with the exception of BLYP-D3 and B3LYP where the sensitivity of the potential energy well with respect to the distance ratio shows the same behavior. Similar clustering can be witnessed if one looks at the results in Tables 2 and 3, where also some method-specific differences are visible: DFT and especially the variants of BLYP, compared with RI-MP2 reference calculations, overestimate bond lengths by some extend, thus resulting in very different values for

Table 2 Overview over the obtained parameters for the harmonic fit

\(r_{\rm{0}}\). At the same time also the force constants \(k_{\rm{L}}\) are relatively smaller. On another note, the addition of dispersion correction to DFT and especially to B3LYP causes the results to be much closer to the RI-MP2 reference data. As expected, parameters from the resolution of identity (RI)-based calculations are almost identical to their non-RI counterparts but nevertheless are presented in the above tables for completion.

The force constants, used to correct the energy difference between a full QM and the QM/MM bond description, vary between 24.68 and 153.39 kcal/mol/Å2 within the range of results of the DFT methods, so the maximum deviation lies at \({\sim}130\) kcal/mol/Å2 —this is larger than the average value of \(k_{\rm{L}}\) of all investigated DFT methods (which is only 94 kcal/mol/Å2). In order to assess whether the observed deviations of \(k_{\rm{L}}\) stem from the chosen QM method or are an artifact of the QM/MM treatment, the full QM force constants \(k_{{\rm{QM}}}\) were obtained by fitting a harmonic potential to the innermost three data points of the full QM scan. The results, shown in Table 3 indicate that (1) the strength of the force constants is indeed strongly dependent upon the chosen QM description

Table 3 Full QM force constants \(k_{{\rm{QM}}}\) and C–N distances \(r_{{\rm{QM}}}\)

(ranging from \({\sim}560\) kcal/mol/Å2 for BLYP and \({\sim}640\) kcal/mol/Å2 for B3LYP to \({\sim}730\) kcal/mol/Å2 for RI-MP2), thus implying that the deviation of \(k_{\rm{L}}\) does not stem from the QM/MM treatment, and (2) the force constants \(k_{{\rm{QM}}}\) of the DFT methods deviate between 547 (BLYP scan of deoxythymidine) and 718 kcal/mol/Å2 (B3LYP-D3 scan of deoxyguanosine) and the average \(k_{{\rm{QM}}}\) of all DFT scans is approximately 613 kcal/mol/Å2. With a potential maximum error of the parametrization lying at \({\sim}130\) kcal/mol/Å2 which is roughly 21% of the total force constant, it is clear that separate harmonic fit parameters for both, BLYP and B3LYP methods, as well as for the purine and pyrimidine bases are necessary. It is also obvious that RI-MP2 calculations demand different parameters altogether. However, the influence of the D3-correction to the energy correction terms is so small that individual parameters for D3 and non-D3 DFT methods do not seem necessary. The final suggested parameters are presented in Table 4. Two things are apparent when looking at the results: firstly, for the distance ratios the influence of the investigated nucleoside’s nature becomes more prominent, the more accurate the chosen QM method is: the ideal distance ratios for BLYP (without dispersion correction) show the least deviation from one another (the difference between the mean values for purine and pyrimidine are just 0.0036). As soon as dispersion correction is employed, the variation of results is much larger (0.0049) and the distance ratios are almost identical to those of B3LYP (without dispersion correction). The MP2-based results show the highest sensitivity regarding the nature of the nucleobase with a purine–pyrimidine deviation of 0.0105. Secondly, whereas dispersion correction has a very large influence on the values of \(\rho\), this is not true for \(r_{\rm{0}}\) and \(k_{\rm{L}}\) so that combined values for DFT and DFT-D3 can be safely used. Instead, an even stronger discrimination between the DFT methods has to be made. This is due to the fact that the resulting predicted bond lengths (\(r_{{\rm{QM}}}\)) and force constants (\(k_{{\rm{QM}}}\)) vary quite a bit, as given in Table  3. By calculating distance ratios, this differences are canceled out to a certain extend because they are directly related to the ideal C–N distance, as well as the length of a N–H bond, as predicted by the selected QM method (see Eq. 1). Hence, if BLYP overestimates the length of the C–N bond compared with B3LYP, the N–H distance is simultaneously overestimated by a similar scale. Therefore, the resulting distance ratios can be very similar nevertheless. This, however, is not true if one looks at the harmonic potential: since this represents the energy difference between the full QM and the QM/MM description of the bond stretching, it is directly dependent only of the predicted bond length and energy. In effect, the methodological characteristics are brought to bear to a much larger extend.

Table 4 Suggested QM/MM link bond parameters for purines and pyrimidines. \(r_{\rm{0}}\) are given in Å and \(k_{\rm{L}}\) in kcal/mol/Å2, respectively

It should be noted that in the QM/MM framework, used for finding above parameters, classical angle and dihedral terms were included in the calculations if at least one MM atom was present in the definition. Often angle and dihedral terms with only QM atoms at their center are excluded in order to prevent double counting [13]. Inclusion or exclusion of said terms, however, has absolutely no influence on the obtained values for \(\rho , k_{\rm{L}}, r_{\rm{0}}\) as only the bond length between \(L_{\rm{Q}}\) and \(L_{\rm{M}}\) was varied, and therefore all contributions from angle and dihedral terms remained constant throughout. This would of course not hold true for simulations at a finite temperature. Another potential influence stems from the way q(\(L_{\rm{M}}\)), the charge of \(L_{\rm{M}}\), was treated. In the used approach, the charge was neglected in the embedding scheme in order to prevent over-polarization. An alternative method would be the distribution of q(\(L_{\rm{M}}\)) to its remaining binding partners [10] (only for the embedding—the force field still uses the original set of charges). While a variation of charges can surely be expected to have an influence, the fact that for the investigated nucleosides the charges q(\(L_{\rm{M}}\)) provided by AMBER-99SB are all in the region between 0.01 and 0.07, this influence is only very minor. It has been shown that even for molecules where q(\(L_{\rm{M}}\)) is considerably larger (between −0.13 and −0.26), the influence on the parameters is still very small [18]. A much larger [18] influence can be expected if the AMBER-94 charges used in the majority of AMBER force field parametrizations are replaced by a different set (e.g., AMBER-03-charges [11]).

4 Conclusion

The obtained results indicate that, depending on the level of theory being utilized in the description of the QM part of the simulation, a careful choice of the used parameters can improve the quality of the description considerably, without additional computational effort. The purine and the pyrimidine bases are chemically dissimilar enough to warrant independent distance ratios in all cases. However, it does not seem to be necessary to discriminate between adenine/guanine and cytosine/thymine, respectively. Generally, separate distance ratios \(\rho\) for all investigated QM methods are suggested, however, due to the fact that those ratios are also very dependent of dispersion correction, sometimes differences between methods can be levelled out. As for the correction terms, which are far more sensitive to the nature of the chosen method than to dispersion correction, the method-specific influences are much more pronounced. Therefore, different \(r_{\rm{0}}\) and \(k_{\rm{L}}\) parameters for RI-MP2, B3LYP and BLYP are necessary in order to exactly mimic the potential energy well of the full QM C–N bond by the utilized link atom approach.