1 Introduction

In 2012, the ATLAS and CMS Collaborations announced the observation [13] of a new boson with a mass of about 125\(\,\text {GeV}\), with properties consistent with expectations for the standard model (SM) Higgs boson. The Higgs boson (\({H}\)) is the particle predicted to exist as a consequence of the spontaneous symmetry breaking mechanism acting in the electroweak sector of the SM [46]. This mechanism was suggested more than fifty years ago, and introduces a complex scalar field, which gives masses to W and Z bosons [711]. The scalar field also gives mass to the fundamental fermions through a Yukawa interaction [5]. Couplings and spin of the new boson are found to be consistent with the SM predictions [1216]. A measurement of the \(\mathrm {p}\mathrm {p}\rightarrow {H} \rightarrow {\gamma }{\gamma } \) differential cross section as a function of kinematic observables investigates possible deviations in distributions related to production, decay, and additional jet activity. It provides a check of perturbative calculations in quantum chromodynamics (QCD), and can point to alternative models in the Higgs sector. A similar analysis has been carried out by the ATLAS Collaboration in diphoton and four-lepton decay channels [1719].

Despite its small branching fraction of \(\approx \)0.2 % [20] predicted by the SM, the \({H} \rightarrow {\gamma }{\gamma }\) decay channel provides a clean final-state topology and a precise reconstruction of the diphoton mass. The dominant background arises from irreducible direct-diphoton production and from the reducible \(\mathrm {p}\mathrm {p}\rightarrow \gamma \)+jets and \(\mathrm {p}\mathrm {p}\rightarrow \text {jets}\) final states. The relatively high efficiency of the \({H} \rightarrow {\gamma }{\gamma }\) selection makes this final state one of the most important channels for observing and investigating the properties of the new boson.

In this paper, the cross section is measured as a function of the kinematic properties of the diphoton system, and, in events with at least one or two accompanying jets, also as a function of jet-related observables. Two isolated photons are required to be within pseudorapidities \(|\eta |<2.5\), and the photon with largest and next-to-largest transverse momentum (\(p_{\mathrm {T}} ^{\gamma } \)) must satisfy the respective conditions of \(p_{\mathrm {T}} ^{\gamma }/m_{\gamma \gamma } >1/3\) and \({>}1/4\), where \(m_{\gamma \gamma }\) represents the diphoton mass. The transverse momentum \(p_{\mathrm {T}}^{\gamma \gamma } \) and the rapidity \(|y^{\gamma \gamma } | \) of the Higgs boson, observables related to the opening angle between the two photons, and the number of jets \(N_\text {jets} \) with \(p_{\mathrm {T}} > 25\) \(\,\text {GeV}\) produced in association with the diphoton system, are defined in this inclusive fiducial selection. A departure relative to the SM-predicted angular distributions would be an important observation, as it could reflect different spin and parity properties [21] than expected in the SM.

The variables defined with at least one accompanying jet are sensitive to the transverse Lorentz boost of the diphoton system. A modification in the corresponding distributions or in the \(p_{\mathrm {T}}^{\gamma \gamma } \) spectrum could signify new contributions to gluon-gluon fusion production of the Higgs boson (ggH) [22]. The variables defined by requiring at least two accompanying jets are related to production of \({H} \rightarrow {\gamma }{\gamma }\) through vector boson fusion (VBF); however, given the low event yield after selecting two jets, no other selection is applied to enhance this production mechanism. This is different from what was done in Ref. [12], where an attempt was made to classify the events according to the production mechanism. The differential cross sections are therefore mainly sensitive to the dominant ggH production mode of the Higgs boson.

The data correspond to an integrated luminosity of 19.7\(\,\text {fb}^\text {-1}\) collected at the CERN LHC by the CMS experiment in proton-proton collisions at \(\sqrt{s}=8\,\text {TeV} \). The trigger requirements and vertex determination are identical to those of Ref. [12], while photon selection and event classification are modified to reduce their dependence on \(p_{\mathrm {T}} ^{\gamma } \) and \(\eta ^{\gamma }\), providing thereby a less model-dependent measurement. Photons are identified using a multivariate classifier that combines information on distributions of shower and isolation variables designed to be independent of \(p_{\mathrm {T}} ^{\gamma } \) and \(\eta ^{\gamma }\). The signal yield is extracted by fitting the \(m_{\gamma \gamma } \) distribution simultaneously in all bins of the observables. To improve the sensitivity of the analysis, the selected events are categorized using an estimator of the mass resolution that is not correlated with \(m_{\gamma \gamma }\), which simplifies the description of the background. Measured distributions are unfolded for detector effects and compared to distributions at the generator level from the latest Monte Carlo (MC) predictions.

The paper is organized as follows. After a brief description of the CMS detector and event reconstruction given in Sect. 2, and of the simulated samples in Sect. 3, the photon selection and event classification are detailed in Sect. 4, where we also describe the kinematic observables. Section 5 provides the statistical methodology for extracting the signal, and gives details on modelling signal and background, and on the unfolding procedure. Systematic uncertainties are detailed in Sect. 6. Unfolded results are then compared with theoretical predictions in Sect. 7, and a brief summary is given in Sect. 8.

2 The CMS detector

A full description of the CMS detector, together with a definition of the coordinate system and the relevant kinematic variables, can be found in Ref. [23]. Its central feature is a superconducting solenoid, 13\(\text {\,m}\) in length and 6\(\text {\,m}\) in diameter, which provides an axial magnetic field of 3.8\(\text {\,T}\). The core of the solenoid is instrumented with trackers and calorimeters. The steel flux-return yoke outside the solenoid is equipped with gas-ionisation detectors used to reconstruct and identify muons. Charged-particle trajectories are measured using silicon pixel and strip trackers, within \(|\eta | < 2.5\). A lead tungstate crystal electromagnetic calorimeter (ECAL) and a brass and scintillator hadron calorimeter (HCAL) surround the tracking volume and cover the region \(|\eta | < 3\). The ECAL barrel extends to \(|\eta | < 1.48\), while the ECAL endcaps cover the region \(1.48 < |\eta | < 3.0\). A lead and silicon-strip preshower detector is located in front of each ECAL endcap in the region \(1.65 < |\eta | < 2.6\). The preshower detector includes two planes of silicon sensors that measure the transverse coordinates of the impinging particles. A steel and quartz-fibre Cherenkov calorimeter extends the coverage to \(|\eta | < 5.0\). In the \((\eta , \phi )\) plane, for \(|\eta | < 1.48\), the HCAL cells map onto \(5\times 5\) ECAL crystal arrays to form calorimeter towers projecting radially outwards from points slightly offset from the nominal interaction point. In the endcap, the ECAL arrays matching the HCAL cells contain fewer crystals. To optimize the energy resolution, the calorimeter signals are calibrated and corrected for several detector effects [24]. Calibration of the ECAL uses the \(\phi \)-symmetry of the energy flow, photons from \(\pi ^{0}\rightarrow \gamma \gamma \) and \(\eta \rightarrow \gamma \gamma \), and electrons from \(\mathrm {W}\rightarrow \mathrm {e}\nu \) and \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-}\) decays. Changes in the transparency of the ECAL crystals due to irradiation during the LHC running periods and their subsequent recovery are monitored continuously and corrected, using light injected from a laser system.

Photons are reconstructed from clusters of energy deposition into so-called “superclusters” in the ECAL [25]. Events are selected using triggers requiring two photons with different thresholds in energy in the transverse plane (\(E_{\mathrm {T}} \)), respectively, \(E_{\mathrm {T}} >26\) and \(>\)18\(\,\text {GeV}\) for the leading and subleading photons, and through other complementary selections. One selection requires a loose calorimetric identification based on the distribution in energy in the electromagnetic cluster, and loose isolation requirements on photon candidates. The other selection requires a photon candidate to have a high value of \(R_\mathrm {9}\)variable, defined as the sum of the energies deposited in the array of \(3{\times }3\) crystals centred on the crystal with highest energy deposition in the supercluster, divided by the energy of the supercluster. Photons that convert to \(\mathrm {e}^{+}\mathrm {e}^{-}\) pairs before reaching the calorimeter tend to have wider showers and smaller values of \(R_\mathrm {9}\)than unconverted photons. High trigger efficiency is maintained by having both photons satisfy either selection. The measured trigger efficiency is 99.4 % for events satisfying the diphoton preselections described in Sect. 4.

3 Monte Carlo samples

The MC simulation of detector response employs a detailed description of the CMS detector, and uses Geant4 version 9.4.p03 [26]. Simulated events include additional \(\mathrm {p}\mathrm {p}\) collisions that take place in or close to the time span of the bunch crossing, and overlap the interaction of interest. The probability distribution of these pileup events is weighted to reproduce the observed number of interactions in data.

The MC samples for ggH and VBF processes use the next-to-leading order (NLO) matrix element generator powheg (version 1.0) [2731] interfaced with pythia 6.426 [32]. The CT10 [33] set of parton distribution functions (PDF) is used in the calculation. The powheg generator is tuned following the recommendations of Ref. [34] and reproduces the Higgs boson \(p_{\mathrm {T}} \) spectrum predicted by the hqt calculation [35, 36]. The pythia 6 tune Z2* is used to simulate the hadronization and underlying event in pp collisions at 8\(\,\text {TeV}\). The Z2* tune is derived from the Z1 tune [37], which uses the CTEQ5L PDF, whereas Z2* adopts CTEQ6L1 [38]. The cross section for the ggH process is reduced by 2.5 % for all values of \(m_{{H}}\) to accommodate its interference with nonresonant diphoton production [39]. The pythia 6 generator is used alone for the VH (where V represents either the W or Z boson) and \({t}\overline{{t}}{H} \) processes with the CTEQ6L1 PDF [38] and Z2* tune. The SM cross sections and branching fractions are taken from Ref. [20].

The samples of Drell–Yan events (\({q} \overline{{q}} \rightarrow {Z}/\gamma ^{*} \rightarrow \ell ^{+}\ell ^{-}\), where \(\ell \) is a lepton), and background samples used to represent the diphoton continuum and processes where one of the photon candidates arises from misidentified jet fragments, are the same as used in Ref. [12]. Simulated samples of \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-}\), \({Z}\rightarrow {\mu ^{+}}{\mu ^{-}}\), and \({Z}\rightarrow {\mu ^{+}}{\mu ^{-}}{\gamma }\) events, used for comparison with data and to extract an energy scale and corrections for resolution of photon energies, are generated with MadGraph, sherpa, and powheg  [40], providing comparisons among the different generators. Simulated background samples are used for training of multivariate discriminants, and for defining selection and classification criteria.

The diphoton continuum processes involving two prompt photons are simulated using sherpa 1.4.2 [41]. The remaining processes where one of the photon candidates arises from misidentified jet fragments are simulated using pythia 6 alone.

A comparison of unfolded data with results from models for ggH using the MC generators hres [42, 43], powheg and powheg+minlo [44], and madgraph5_amc@nlo [45] is presented in Sect. 7.

4 Event selection and classification

Trigger requirements, vertex determination, and kinematic criteria on photons are unchanged relative to those given in Ref. [12]. The multivariate classifiers used to identify photons and to estimate mass resolution are also unchanged, but used in a different way. Instead of using a discriminant for photon identification as an input to the final diphoton kinematic discriminant, a requirement is set on the photon identification discriminant. Event classification, instead of being based on the output of the kinematic discriminant, is based on the estimated \(m_{\gamma \gamma } \) resolution. These differences are described in greater detail in the following section.

4.1 Photon identification

Photon candidates are required to be within the fiducial region of \(|\eta |<2.5\), excluding the barrel–endcap transition region of \(1.44 < |\eta | < 1.57\), where photon reconstruction is not optimal. The transverse momenta of the two photons are required to satisfy the previously mentioned conditions of \(p_{\mathrm {T}} ^{\gamma 1}/m_{\gamma \gamma } >1/3\) and \(p_{\mathrm {T}} ^{\gamma 2}/m_{\gamma \gamma } >1/4\). The use of \(p_{\mathrm {T}} \) thresholds scaled by \(m_{\gamma \gamma } \) prevents the distortion of the low end of the \(m_{\gamma \gamma } \) spectrum that results if a fixed threshold is used. Photons are also required to satisfy preselection criteria based on isolation and distributions in shower variables slightly more stringent than used in the trigger requirements.

Three variables are calculated for each reconstructed candidate for a \(\mathrm {p}\mathrm {p}\) interaction vertex: the sum of the \(p_{\mathrm {T}} ^2\) of the charged-particle tracks emerging from the vertex, and two variables that quantify the difference in the vector and scalar sums in \(p_{\mathrm {T}}\) between the diphoton system and the charged-particle tracks associated with the vertex. In addition, if either photon is associated with any charged-particle track identified as resulting from \(\gamma \rightarrow \mathrm {e}^{+}\mathrm {e}^{-} \) conversion, extrapolation of their trajectories is used to clarify the origin of the vertex of their production. These variables are used as inputs to a multivariate system based on a boosted decision tree (BDT) classifier to choose the reconstructed vertex to associate with the diphoton system. All BDTs are implemented using the tmva [46] framework.

Another BDT is trained to separate prompt photons from photon candidates resulting from misidentification of jet fragments passing the preselection requirements. Inputs to the BDT are variables related to the lateral spread of the shower, and isolation energies reconstructed from scalar sums in \(p_{\mathrm {T}} \) of charged particles and \(E_{\mathrm {T}} \) sums of photons in a cone with an opening angle \(\Delta R = \sqrt{{(\Delta \eta )^{2}+(\Delta \phi )^{2}}} <0.3\) around the photon, computed using the particle-flow (PF) algorithm [47, 48].

The \(\eta \) and energy of the supercluster corresponding to the reconstructed photon are also included as input variables in the photon identification BDT. These variables are introduced to explicitly correlate the shower topology and isolation variables with \(\eta \) and \(p_{\mathrm {T}}\). Furthermore, during the BDT training, the \(\eta \) and \(p_{\mathrm {T}}\) background distributions are reweighted to match the distributions in the signal. As a result, for a given requirement on the BDT output, efficiencies for photon identification are almost independent of \(\eta \) and \(p_{\mathrm {T}}\), as can be seen in Fig. 1, which provides less model-dependent efficiency corrections for comparison of data and MC expectations at the particle level. A loose selection is applied on the BDT output for photons detected in the barrel region, while a tight selection is applied to endcap photons, with respective mean efficiencies of 95 and 90 % for signal photons.

Photon efficiencies in signal samples are corrected for the difference in efficiency between data and simulation as measured with \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-}\) events, treating electrons as photons by reweighting the electron cluster variable \(R_\mathrm {9}\) to that of the \(R_\mathrm {9}\) distribution of signal photons. There is good agreement found in the BDT output between data and simulation in \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-}\) and \({Z}\rightarrow {\mu ^{+}}{\mu ^{-}}{\gamma }\) events.

Fig. 1
figure 1

Photon identification efficiencies for a Higgs boson with \(m_{{H}} =\)125\(\,\text {GeV}\), as a function of photon pseudorapidity (top), and photon transverse momentum (bottom)

4.2 Event classification using an estimator of mass resolution

Measuring differential kinematic distributions implies that events cannot be classified according to the diphoton kinematic BDT used for the inclusive measurement of Ref. [12], as that would create a bias in the result. To improve the performance beyond the simple classification using the \(R_\mathrm {9}\) variable in the reference sequential analysis, referred to as “cut-based analysis” in Ref. [12], an event categorization is introduced that is based on the estimated energy resolution.

The photon energies are corrected using a multivariate regression technique for the containment of showers in clustered crystals, for shower losses of photons that convert in the material upstream of the calorimeter, and for effects from pileup, based on shower variables and variables related to positions of photons in the detector studied in \(\gamma \)+jets simulated events. The energy response to photons is parameterized through an extended form of the Crystal Ball function [49] with a Gaussian core and two power law contributions. The regression provides an estimate of the parameters of the function, and therefore a prediction for the distribution in the ratio of the uncorrected supercluster energy to the true energy. The correction to the photon energy is taken as the inverse of the most probable value of this distribution. The standard deviation of the Gaussian core provides an estimate of the uncertainty in the energy (\(\sigma _E\)).

We define the estimator of the diphoton mass resolution by:

$$\begin{aligned} \frac{\sigma _m}{m_{\gamma \gamma }} = \frac{1}{2} \sqrt{ \left( \frac{\sigma _{E_1}}{E_1} \right) ^2 + S_1^2 + \left( \frac{\sigma _{E_2}}{E_2} \right) ^2 + S_2^2} \end{aligned}$$
(1)

where \(E_1\) and \(E_2\) are the corrected energies of the two photons from the regression, \(\sigma _{E_1}\) and \(\sigma _{E_2}\) are the uncertainties in the photon energies, and \(S_1\) and \(S_2\) are smearing terms depending on \(\eta \), \(R_\mathrm {9}\), and \(E_{\mathrm {T}} \), determined from \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-}\) events, needed for the simulation to match the energy resolution in data.

For the typical energy range of photons in this analysis, the relative energy resolution \(\sigma _E/E\) depends on the energy, which in turn introduces a dependence of the mass resolution on the value of \(m_{\gamma \gamma }\). Therefore, a categorization based on the diphoton mass resolution introduces a distortion of the background mass spectrum. To obtain a smooth background description, we apply a transformation to the estimator \(\sigma _E\) to decorrelate \(\sigma _E/E\) from the energy.

The value of \(\sigma _E/E\) depends on \(\eta \) because of differences in material in front of the ECAL and the inherent properties of the ECAL. To ensure that the decorrelation is performed independent of the \(\eta \) distribution of the training sample, in a first step a transformation is applied to make \(\sigma _E/E\) independent of E and \(\eta \). A \(\gamma \)+jets MC sample is used to build the fully decorrelated variable, that covers a wide range in \(\eta \) and \(p_{\mathrm {T}} \). The decorrelation is performed by making a change of variable, replacing the probability distribution in \(\sigma _E/E\) by its cumulative distribution function \(cdf(\sigma _E/E)\). This function follows a uniform distribution [50] and removes any correlation between \(\sigma _E/E\) and (E,\(\eta \)). Since the \(\eta \) dependence is removed, this variable is not suitable for estimating a per-event mass resolution, which is non-uniform in the detector. A second step is therefore introduced to restore the correlation of the energy resolution with photon \(\eta \) in a \({H} \rightarrow {\gamma }{\gamma }\) MC sample with \(m_{{H}} =123\) \(\,\text {GeV}\) (statistically independent of the simulated events used to model the signal), and recover thereby a dependence of \(\sigma _E\) on \(\eta \) for the photons of interest. In this step a new change of variable is performed, replacing the previous \(\sigma _E/E\) with the inverse of the cumulative distribution function of \(\sigma _E/E(\eta )\). The impact of the particular \(p_{\mathrm {T}} \) spectrum used for this step is only modest for the correlation of \(\sigma _E\) with \(\eta \), which is in any case dominated by the material distribution in front of the ECAL and by calorimeter performance. The advantage of such a two-step procedure is that it offers a decorrelated \(\sigma _E/E\) variable that is uncorrelated with E and does not depend on the \(\eta \) distribution of the training sample. It provides the typical energy resolution at a given \(\eta \) of the detector. The final result can be interpreted as an estimator of the average energy resolution at a particular value of \(\eta \).

The value in the estimator of the mass resolution, after decorrelation, is used to categorize events. The \(\sigma _m/m_{\gamma \gamma } \) distribution in \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-} \) data shown in Fig. 2 indicates good agreement with the MC expectation over the whole range of \(\sigma _m/m_{\gamma \gamma } \). The double bump structure corresponds mainly to events where (Fig. 2a) both photons are in the central barrel (\(|\eta |<1.0\), \(\sigma _m/m_{\gamma \gamma } < 1\%\)), or at least one of the photons is in the outer barrel (\(1.00<|\eta |<1.44\), \(\sigma _m/m_{\gamma \gamma } > 1\,\%\)), and (Fig. 2b) one photon is in the barrel and one in the endcap, both having high values of the \(R_\mathrm {9}\) (\(\sigma _m/m_{\gamma \gamma } < 1.5\,\%\)), and all the other events (\(\sigma _m/m_{\gamma \gamma } > 1.5\,\%\)). The class boundaries in \(\sigma _m/m_{\gamma \gamma } \) are optimized in MC samples simultaneously with the photon-identification working points desired to maximize signal significance. The photon identification efficiency is about 95 % in each category of \(\sigma _m/m_{\gamma \gamma } \). The category with best energy resolution has \(\sigma _m/m_{\gamma \gamma } < 0.79\,\%\), and is composed of events with both selected photons in the central barrel. Both the second (\(0.79 < \sigma _m/m_{\gamma \gamma } < 1.28\,\%\)) and third categories (\(1.28 < \sigma _m/m_{\gamma \gamma } < 1.83\,\%\)) have at least one photon in the outer barrel or in the endcap. Events with \(\sigma _m/m_{\gamma \gamma } > 1.83\,\%\) do not provide a noticeable improvement in sensitivity and are therefore rejected. Classifying events in these three categories improves the analysis sensitivity by nearly 10 %.

It was verified in the simulation that the classification of events according to \(\sigma _m/m_{\gamma \gamma } \), after implementing the decorrelation procedure for \(\sigma _E\), does not produce a distortion of the background distribution in \(m_{\gamma \gamma }\).

Fig. 2
figure 2

Mass resolution estimator \(\sigma _m/m_{\gamma \gamma } \) after the decorrelation procedure, in \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-}\) events in data (dots) and simulated events (histogram) with their systematic uncertainties (shaded bands) for barrel-barrel events (top), all the other events (bottom). The ratio of data to MC predictions are shown below each panel, and the error bars on each point represent the statistical uncertainties of the data

4.3 Jet identification

Jets are reconstructed using particles identified by the PF algorithm, using the anti-\(k_{\mathrm {T}}\) [51] algorithm with a distance parameter of 0.5. Jet energy corrections account in particular for pileup, and are obtained from simulation. They are calibrated with in situ measurements using the energy balance studied in dijet and \({\gamma }/{Z}\)+jet events [52]. The jet momentum scale is found to be within 5–10 % of the true jet momentum over the whole spectrum and detector acceptance. The jet energy resolution is typically 15 and 8 % at 10 and 100\(\,\text {GeV}\), respectively. Mean resolutions of 10–15 % are observed in the respective regions of \(|\eta |<0.5\) and \(3<|\eta |<5\). Jets are selected if they fail the pileup identification criteria [53], and have \(p_{\mathrm {T}} ^{\mathrm {j}} >25\) \(\,\text {GeV}\). The minimum distance between photons and jets is required to be \(\Delta R(\gamma ,\)j\()=\sqrt{\Delta \eta (\gamma ,\mathrm {j})^2 + \Delta \phi (\gamma ,\mathrm {j})^2}>0.5\), where \(\Delta \eta (\gamma ,\)j) and \(\Delta \phi (\gamma ,\)j) are the pseudorapidity and azimuthal angle differences between photons and jets, to minimize photon energy depositions into jets.

4.4 Fiducial phase space and observables

The generator-level fiducial volume is chosen to be close to that used in the selection of reconstructed events, and follows previous prescriptions: photons must have \(p_{\mathrm {T}} ^{\gamma 1}/m_{\gamma \gamma } >1/3\) and \(p_{\mathrm {T}} ^{\gamma 2}/m_{\gamma \gamma } >1/4\), with both photons within \(|\eta |<2.5\). The photons have to be isolated at the generator level, with \(\sum _{i} E_{\mathrm {T}i} < 10\) \(\,\text {GeV}\), where i runs over all the other generator-level particles in a cone \(\Delta R<0.4\) around the photons. This selection corresponds to a signal efficiency of 63 % in ggH, and almost 60 % considering other production mechanisms. We measure the kinematic observables using two-photon criteria, as well as requiring at least one or two jets.

The transverse momentum \(p_{\mathrm {T}}^{\gamma \gamma } \) and absolute value of the rapidity \(|y^{\gamma \gamma } | \) of the Higgs boson are measured using the inclusive selection. Both \(p_{\mathrm {T}}^{\gamma \gamma } \) and \(|y^{\gamma \gamma } | \) probe the production mechanism, while the photon helicity angle \(\cos \theta ^{*}\) in the Collins–Soper frame [54] of the diphoton system and the difference in azimuth \(\Delta \phi ^{\gamma \gamma } \) between the two photons are related to properties of the decaying particles.

The number of jets \(N_\text {jets} \), the transverse momentum \(p_{\mathrm {T}} ^\mathrm {j1} \) of the jet with largest (leading) \(p_{\mathrm {T}} \) in the event, and the rapidity difference between the Higgs boson and the leading jet \(|y^{\gamma \gamma }-y^{\mathrm {j1}} | \) are defined after requiring at least one jet with \(p_{\mathrm {T}} >25\) \(\,\text {GeV}\) to lie within \(|\eta |<2.5\). The difference in rapidities between the diphoton system and the leading jet provides a sensitive probe of any new contributions to the ggH process.

Requiring at least two jets with \(p_{\mathrm {T}} >25\) \(\,\text {GeV}\) and \(|\eta |<4.7\) provides the basis for defining the following observables: the dijet mass \(m_\mathrm {jj} \), the azimuthal angle difference of the two jets \(\Delta \phi ^{\mathrm {jj}} \), the difference in pseudorapidity \(\Delta \eta ^{\mathrm {jj}} \) between the leading and subleading jet, the Zeppenfeld [55] variable \(|\eta ^{\gamma \gamma }-(\eta ^{\mathrm {j1}}+\eta ^{\mathrm {j2}})/2 | \), and the difference in azimuth between the Higgs boson and the dijet system \(\Delta \phi ^{\gamma \gamma ,\mathrm {jj}} \). A requirement of \(|\eta ^{\text {j}} |<2.5\) is applied in the single-jet selection, as this selection aims to probe primarily ggH process, while the two-jet selection has a requirement \(|\eta ^{\text {j}} |<4.7\) since it is oriented toward VBF.

The bin boundaries for each kinematic observable are optimized to achieve similar relative statistical uncertainties in the expected cross section in each bin, namely 60 % for each bin in the inclusive observables, 70 % in the one-jet observables, and more than 100 % in the two-jet observables. The relative statistical uncertainty expected in the fiducial cross section is 30 %.

5 Extraction of signal and unfolding of detector effects

The signal yield is extracted by fitting the \(m_{\gamma \gamma } \) distribution using a signal model based on simulated events, and a background model determined in the fit to the data. The statistical methodology is similar to Ref. [12], and for each observable the fit is performed simultaneously in all the bins. The reconstructed yields are corrected for detector effects by including the response matrix in the fit.

5.1 Models for signal

Signal models are constructed for each class of \(\sigma _m/m_{\gamma \gamma } \) events, and for each production mechanism, from a fit to the simulated \(m_{\gamma \gamma }\) distribution, after applying the corrections determined from comparisons of data and simulation for \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-} \) (also checked with \({Z}\rightarrow {\mu ^{+}}{\mu ^{-}}{\gamma } \)) events, for \(m_{{H}} =120,125\), and \(130\,\text {GeV} \). Mass distributions for the best and worst choices of diphoton vertex, corresponding to the highest and lowest scores in the vertexing BDT [12], are fitted separately. Good descriptions of the distributions can be achieved using sums of Gaussian functions, where the means are not required to be identical. As many as four contributing Gaussian functions are used, although in most cases two or three provide an acceptable fit. Models for intermediate values of \(m_{{H}}\) are obtained by linear interpolation of the fitted parameters.

5.2 Statistical methodology

After implementing the above-described selection requirements on photon candidates, a simultaneous binned maximum likelihood fit is performed to the diphoton invariant mass distributions in all the event classes over the range \(100<m_{\gamma \gamma } <180\,\text {GeV} \) for each differential observable. The test statistic chosen to measure signal and background contributions in data is based on the profile likelihood ratio [56, 57]. Systematic uncertainties are incorporated into the analysis via nuisance parameters and treated according to the frequentist paradigm.

We use the same discrete profiling method to fit the background contribution [58] as used in extracting the main \({H} \rightarrow {\gamma }{\gamma }\) result [12]. The background is evaluated by fitting the \(m_{\gamma \gamma } \) distribution in data, without reference to MC simulation. Thus the likelihood to be evaluated in a signal+background hypothesis is

$$\begin{aligned} \mathcal{L} (\mu _i)=\mathcal{L} (\text {data}|s_i(\mu _i,m_{\gamma \gamma })+f_i(m_{\gamma \gamma })), \end{aligned}$$
(2)

where \(\mu _i\) is the signal strength (ratio of measured to expected yields) in the bin i of a differential distribution that is varied in the fit, \(s_i(\mu _i,m_{\gamma \gamma })\) represents the model for signal, and \(f_i(m_{\gamma \gamma })\) the fitted background functions.

The choice of function used to fit the background in any particular event class is included as a discrete nuisance parameter in the formulation used to extract the result. Exponentials, power-law functions, polynomials in the Bernstein basis, and Laurent polynomials are used to represent \(f(m_{\gamma \gamma })\). When fitting a signal+background hypothesis to the data, by minimizing the value of twice the negative logarithm of the likelihood, all functions in these families are tried, with a “penalty” term added to account for the number of free parameters in the fit. The penalized likelihood function \(\widetilde{\mathcal{L}} _f\) for a single fixed background fitting function f is defined through

$$\begin{aligned} -2\,\ln \widetilde{\mathcal{L}} _f=-2\,\ln \mathcal{L} _f+N_{f}, \end{aligned}$$
(3)

where \(\mathcal{L} _f\) is the “unpenalized” likelihood function, and \(N_{f}\) is the number of free parameters in f. The full set of \(\mu _i\), denoted by \(\mathbf {\mu }\), is determined by minimizing the likelihood ratio:

$$\begin{aligned} L(\mathbf {\mu })=-2\,\ln \frac{\widetilde{\mathcal{L}} (\text {data}|\mathbf {\mu },\hat{\theta }_{\mathbf {\mu }},\hat{f}_{\mathbf {\mu }})}{\widetilde{\mathcal{L}} (\text {data}|\hat{\mathbf {\mu }},\hat{\theta }_{\hat{\mathbf {\mu }}},\hat{f}_{\hat{\mathbf {\mu }}})}, \end{aligned}$$
(4)

where the numerator represents the maximum of \(\widetilde{\mathcal{L}} \) as a function of \(\mathbf {\mu }\), achieved for the best-fit values of the nuisance parameters \(\theta _{\mathbf {\mu }}=\hat{\theta }_{\mathbf {\mu }}\), and a particular background function \(f_{\mathbf {\mu }}=\hat{f}_{\mathbf {\mu }}\). The denominator corresponds to the global maximum of \(\widetilde{\mathcal{L}} \), where \(\mathbf {\mu }=\hat{\mathbf {\mu }}\), \(\theta _{\mathbf {\mu }}=\hat{\theta }_{\hat{\mathbf {\mu }}}\), and \(f_{\mathbf {\mu }}=\hat{f}_{\hat{\mathbf {\mu }}}\). In each family, the number of degrees of freedom (number of exponentials, number of terms in the series, degree of the polynomial, etc.) is increased until no significant improvement (p value \(<\)0.05 obtained from the F-distribution [59]) occurs in the likelihood between N + 1 and N degrees of freedom for the fit to data.

For a given observable, the fit is performed simultaneously over all the bins, and the nuisance parameters are profiled in the fit. The signal mass is also considered a nuisance parameter and profiled for each observable. This choice is made to avoid using the same data twice, first to measure the signal mass, then to measure the kinematic distribution at this mass. As a consequence the differential cross section for each observable is evaluated at slightly different best fit values of the signal mass (see Table 1 in appendix A). As an example, Fig. 3 shows the sum of the fit to the events of the three \(\sigma _{m}/m_{\gamma \gamma } \) classes in the fiducial phase space measurement, under the signal+background hypothesis (S+B), weighted by S/(S+B) separately in each category.

Fig. 3
figure 3

Sum of the signal + background (S + B) model fits to the events of the three \(\sigma _{m}/m_{\gamma \gamma } \) classes in the fiducial phase space measurement, weighted by S/(S + B) separately in each category, together with the data binned as a function of \(m_{\gamma \gamma } \). The 1 and 2 standard deviation bands of uncertainty (labeled as 1\(\sigma \) and 2\(\sigma \)) shown for the background component include the uncertainty due to the choice of function and the uncertainty in the fitted parameters. The bottom panel shows the result after subtracting the background component

The uncertainty on the expected signal strength of the fiducial cross section is \(\sigma _{\hat{\mu }} = 0.32\) for the present analysis, compared with the uncertainty on the expected signal strength obtained with the reference analysis of \(\sigma _{\hat{\mu }} = 0.26\) using 8\(\,\text {TeV}\) data. The reference analysis [12] classifies events using exclusive categories dedicated to measuring signal production in associated mechanisms, while the present analysis is performed inclusively.

5.3 Unfolding detector effects

The measurement is performed simultaneously in all bins, together with the unfolding of the detector effects to the particle level. The same kind of procedure was used to extract the signal strength in untagged and dijet-tagged categories to measure the couplings of the Higgs boson to vector bosons and fermions [12]. This procedure uses asymmetric uncertainties in the full likelihood instead of being limited to Gaussian uncertainties computed with a covariance matrix.

The unfolding of reconstructed distributions is based on models of response matrices \(K_{ij}\) in the three \(\sigma _m/m_{\gamma \gamma } \) categories for each observable to be measured. The \(K_{ij}\) are constructed in the simulation to give the probability of measuring a reconstructed event in bin j, given that it was generated in bin i. The models for contributions of signal are contained in each element of the response matrix, which is mostly diagonal, but can have non-negligible bin-by-bin migration resulting in off-diagonal contributions.

The unfolding is performed using a maximum likelihood technique, adapted to combine the measurement in different categories. Regularization is not used, so as not to shift the best-fit value while artificially decreasing the uncertainties. This leads to minimizing the following conditional log-likelihood expression:

$$\begin{aligned} \mathcal {F} (\mathbf {\mu } ) = -2 \sum _j \log \mathcal {L} \left( \sum _i K_{ij} \mu _i N^{\text {gen}}_{i}|N^{\text {reco}}_{j} \right) \end{aligned}$$
(5)

where \(\mathcal {L}\) is the log-likelihood expression in Eq. (2), \(\mu _i N^{\text {gen}}_{i}\) is the unknown unfolded particle-level distribution, \(\mu _i\) is the unknown signal strength at particle level, \(N^{\text {gen}}_{i}\) is the particle-level distribution in the simulated kinematic observable, and \(N^{\text {reco}}_{j}\) is the number of events in each bin of the measured distribution. The indices i refer to particle-level bins while j refer to reconstructed-level bins in the three \(\sigma _m/m_{\gamma \gamma } \) categories.

The expected reconstructed signal at detector level is given by the vector \(J_{j}=K_{ij} N^{\text {gen}}_{i}\), where each entry of \(K_{ij}\) corresponds therefore to a set of signal models, computed by interpolating the matrix between the generated mass points, weighted by the signal efficiency interpolated at the same mass. The matrix is a function of \(m_{{H}} \), the model for the generated bin i and the reconstructed bin and category j, as well as all the nuisance parameters. Events falling out of the acceptance are taken into account by forming an extra bin.

The maximum likelihood fit is performed simultaneously for the diphoton background and the signal strength in each generator bin, as described in Sect. 5.2. The out-of-acceptance bin is left fixed in the fit, because there are only N bins at the reconstructed level, where the detector is able to perform the measurement, and it is therefore not possible to determine an extra unknown at the particle level (\(N+1\) unknowns), outside of the detector acceptance. To restore the correct cross section normalization in the fiducial region, the generator distributions for the variables of interest (\(N^{\text {gen}}_{i}\) at the fitted mass point \(m_{{H}} \)) are multiplied by the measured set of signal strengths (\(\mu _i\)).

The enhancement of the statistical uncertainties due to the presence of off-diagonal elements in the response matrix is small for the inclusive observables, while it is non-negligible for the one-jet or two-jet observables. A negative number of events is measured in only one bin of the rapidity difference between the Higgs boson and the leading jet, in the last bin of the dijet mass distribution, and in two bins of the azimuthal difference between the Higgs boson and the dijet system. In each case, within the uncertainties, the result is compatible with zero.

The model dependence introduced by the unfolding is checked using the following procedure. The same model used for the SM Higgs boson is kept in the unfolding matrix, leaving the expected yield for ggH unchanged. The expected number of signal events arising from associated production mechanisms is altered by 50 % in the fit. This change introduces a redistribution of the events in the \(\sigma _m/m_{\gamma \gamma } \) categories relative to the nominal analysis. The change in the fiducial cross section is less than 5 % of its statistical uncertainty. In general, the impact in each bin of the measured differential distributions can be up to 5 and 10 % of the statistical uncertainty in the respective bins of the inclusive and jet observables.

6 Systematic uncertainties

Systematic uncertainties listed in this section are included in the likelihood as nuisance parameters and are profiled during the minimization. Unless specified to the contrary, the sources of uncertainty refer to the individual quantity studied, and not to the final yield. The precision of the present measurement is however dominated by statistical uncertainties.

The sources of uncertainty assigned to all events can be summarised as follows. The uncertainty in the integrated luminosity is estimated as described in Ref. [60], and amounts to 2.6 % of the signal yield in the data. The uncertainty in the vertex-finding efficiency is taken from the difference observed between data and simulation in \({Z}\rightarrow {\mu ^{+}}{\mu ^{-}} \) events, following removal of the muon tracks to mimic a diphoton event. A 1 % uncertainty is added to account for the activity from charged-particle tracks in signal, estimated by changing the underlying event tunes in ggH events; another uncertainty of 0.2 % accounts for the uncertainty in the \(m_{\gamma \gamma }\) distribution of signal events. The uncertainty in the trigger efficiency is extracted from \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-} \) events using a “tag-and-probe” technique [61]. A rescaling in the \(R_\mathrm {9}\)distribution is used to take into account the difference between electrons and photons, for a total uncertainty of 1 % assigned for this source.

The following correspond to systematic uncertainties related to individual photons. The uncertainty in the energy scale of photons is assessed using simulated samples in which the amount of tracker material is increased uniformly by 10 % in the central barrel, where the material is known with best precision, and 20 % out of this region. These values were chosen as upper limits on the additional material, as derived from the data. The resulting uncertainty in the photon energy ranges from 0.03 % in the central ECAL barrel up to 0.3 % in the outer endcap. Additional uncertainties of 0.015 % are due to the modelling of the fraction of scintillation light reaching the photodetector, and from nonuniformities in the radiation-induced loss of transparency of the crystals. A small uncertainty of 0.05% is added to account for modelling of electromagnetic-showers in \({\textsc {geant}} 4\) version 9.4.p03.

Possible differences between MC simulation and data in the extrapolation of shower energies typical of electrons from \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-} \) decays to those typical of photons from \({H} \rightarrow {\gamma }{\gamma } \) decays, have been investigated with \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-} \) and \(\mathrm {W}\rightarrow \mathrm {e}\nu \) data. The effect of differential nonlinearity in the measurement of photon energies has an effect of up to 0.1 % on the diphoton mass for \(m_{\gamma \gamma } \approx 125\,\text {GeV} \).

The energy scale and resolution in data are measured with electrons from \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-} \) decays. Systematic uncertainties in the method are estimated as a function of \(|\eta |\) and \(R_\mathrm {9}\). The uncertainties range from 0.05 % for unconverted photons in the central ECAL barrel, to 0.1 % for converted photons in the outer endcaps of the ECAL. Finally, there is an overall uncertainty that accounts for possible mismodelling of the \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-} \) line shape in simulation.

The uncertainties in the BDT discriminant for photon identification and in the estimate of photon energy resolution are discussed together since they are studied in the same way. The dominant underlying cause of the observed differences between data and simulation is the simulation of the energy distribution in the shower. The combined contribution of the uncertainties in these two quantities dominates the experimental contribution to the systematic uncertainty in signal strength. The agreement between data and simulation is examined when photon candidates are electron showers reconstructed as photons in \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-} \) events, photons in \({Z}\rightarrow {\mu ^{+}}{\mu ^{-}}{\gamma } \) events, and leading photons in preselected diphoton events where \(m_{\gamma \gamma } >160\,\text {GeV} \). A change of \(\pm 0.01\) in the value of the photon identification discriminant, together with an uncertainty in the estimated photon energy resolution parameterized, respectively as a rescaling of the resolution estimate by \(\pm 5\,\%\) and \(\pm 10\,\%\) about its nominal value in the barrel and in the endcap, fully cover the differences observed in all three of these data samples.

The uncertainty in the preselection efficiency is taken as the uncertainty in the data/MC scale factors measured using \({Z}\rightarrow \mathrm {e}^{+}\mathrm {e}^{-} \) events with a tag-and-probe technique.

Jet observables are affected by systematic uncertainties arising from jet identification, jet energy scale, and resolution. For the jet observables, a systematic uncertainty of less than 1 % in the impact of the algorithm used to reject jets from pileup is neglected. For jets within \(|\eta |<2.5\), the energy scale uncertainty is \(\approx \)3 % at 30\(\,\text {GeV}\), and decreases quickly as a function of the increasing jet \(p_{\mathrm {T}} \). The impact of this uncertainty in the cross section is 1–5 %, increasing with the number of jets. Dijet observables for forward jets (up to \(|\eta |<4.7\)) have the worst energy resolution and a scale uncertainty of \(\approx \)4.5 % at 30\(\,\text {GeV}\). The impact of these uncertainties on the cross section ranges from 5 to 11 %, increasing in kinematical regions of observables where jet \(\eta \) is large. Small contributions from corrections in jet energy resolution have an impact of less than 1 % and are neglected.

7 Comparison of data with theory

7.1 Theoretical predictions

The unfolded data are compared with the hres [42, 43], \({\textsc {powheg}} \) [2730], powheg+minlo  [44], and madgraph5_amc@nlo  [45, 62] MC generators for ggH production.

The hres parton-level generator corresponds to next-to-next-to-leading order (NNLO) accuracy in perturbative QCD, with next-to-next-to-leading logarithm soft-gluon resummation; hres v.2.3 assumes finite bottom and top quark masses, using respective values of \(m_b=4.75\) \(\,\text {GeV}\) and \(m_t=175\) \(\,\text {GeV}\). The renormalization scale \(\mu _R\) and factorization scale \(\mu _F\) are set to \(m_{{H}} =125\) \(\,\text {GeV}\), while the resummation scale is set to \(m_{{H}}/2\). The MSTW2008NNLO [63] PDF is used for the central value, and its 68% confidence level eigenvectors for computing the uncertainty (following the LHAPDF [64] recipe). The dependence on scale is evaluated by changing independently both the renormalization and factorization scale up and down by a factor of two around the central value \(m_{{H}} \), and not considering simultaneous changes such as \(\mu _R=m_{{H}}/2\) and \(\mu _F=2 m_{{H}} \). Because hres cannot be interfaced to a parton-shower program, no isolation is applied at the partonic level. A nonperturbative correction must be applied to the distributions to correct for the efficiency loss due to isolation requirements in the presence of parton shower and underlying event. The nonperturbative correction is evaluated from the mean of the isolation efficiencies computed with \({\textsc {powheg}} \) and madgraph5_amc@nlo (as described below), estimated to be 3.1 % (up to 5 % in some bins). The uncertainty is taken as half of the envelope, which is between 0.5 and 5 %, depending on the kinematical region.

The \({\textsc {powheg}} \) parton-level generator implements NLO calculations [27] interfaced to parton shower programs. Samples of events with a Higgs boson with \(m_{{H}} =125\) \(\,\text {GeV}\) produced via ggH, assuming an infinite top quark mass, are generated and hadronized with pythia 6.4. A sample of events with a Higgs boson produced in association with just a single jet, called \({\textsc {powheg}} \) HJ, is also generated with powheg+minlo  [44]. This sample has NLO accuracy for 0-jet and 1-jet production, while it is only leading-order (LO) for 2-jet final states. Both samples set the damping factor \(\textit{hfact}\) in \({\textsc {powheg}} \) at 100\(\,\text {GeV}\) to reproduce the predicted \(p_{\mathrm {T}} \) distribution of the Higgs boson from hqt [35, 36]. This factor minimizes emission of extra jets beyond those in the matrix element in the limit of large \(p_{\mathrm {T}} \), and enhances contribution from the \({\textsc {powheg}} \) Sudakov form factor as \(p_{\mathrm {T}} \) approaches 0. The CT10 PDF and pythia 6 tune Z2* are used in the calculation. Theoretical uncertainties are computed in the same way as described for hres.

The madgraph5_amc@nlo matrix element generator is capable of generating LO and NLO processes [62]. The ggH process is generated using the NLO Higgs characterization model [65], with effective coupling of the Higgs boson to gluons in the infinite top quark mass limit. Gluon fusion is generated with 0, 1, or 2 additional jets at NLO in the Born matrix element, and combined using FxFx merging [66]. Samples are generated using the CT10 PDF, and showered using pythia 8.185 [67] with the 4C tune [68]. A nominal merging scale of 30\(\,\text {GeV}\) is used for the additional jet multiplicities. The effect of changing the merging scale from 20 to 60\(\,\text {GeV}\) is very small compared to uncertainties in scale and in the choice of PDF. Uncertainties from renormalization and factorization scales are evaluated in the same way as with hres and \({\textsc {powheg}} \).

Theoretical predictions for associated production mechanisms are computed with the following generators. \({\textsc {powheg}} \) interfaced with pythia 6 is used for VBF, while standalone pythia 6 is used for VH and \({t}\overline{{t}}{H} \). In the following, the notation XH refers to the sum of VBF, VH and \({t}\overline{{t}}{H} \) predictions for these generators. Each of the ggH predictions for hres, \({\textsc {powheg}} \), madgraph5_amc@nlo, powheg+minlo, and XH processes are normalized to the total cross sections from Ref. [20].

Along with the SM predictions, the following alternative models are considered. Spin \(2^{+}_{m}\) minimal model (graviton-like) initiated through two production mechanisms: ggH and \({q} \overline{{q}} \) annihiliation, based on the jhugen generator [21, 69] and normalized to the total SM cross section. The main changes relative to the SM are expected in the inclusive Collins–Soper \(\cos \theta ^{*}\) angular distribution in the \(\gamma \gamma \) rest frame, which provides maximum information on the spin of the \(\gamma \gamma \) system. The two spin-2 samples are compared to the data in the \(\cos \theta ^{*}\) observable. Anomalous couplings parametrized with the \(O_W\) dimension-6 operator in linearly realized effective field theory [70] are also considered, and implemented through the Universal FeynRules Output [71] in MadGraph 5. The \(O_W\) operator is related to the anomalous triple gauge coupling parameter \(\Delta g_1^Z\) [72]. The values of the Wilson coefficients are \(F_W = -5 \times 10^{-5}\,\text {GeV} ^{-4}\) and \(F_W = +5 \times 10^{-5}\,\text {GeV} ^{-4}\), both corresponding to \(\Delta g_1^Z = 0.21\), a value approximately five times the size of the limits set by LHC diboson measurements [7376]. Both values modify the kinematic distributions of the Higgs boson in the VBF process toward larger \(p_{\mathrm {T}} \), and the \(F_W = -5 \times 10^{-5}\,\text {GeV} ^{-4}\) value also increases the VBF cross section by approximately a factor of 3.

All the predictions are generated at \(m_{{H}} =125\) \(\,\text {GeV}\). For each observable, a correction factor is applied in each bin of the differential cross section to correct for the mass difference of the generated sample relative to the measured \(m_{{H}} \) in data. The correction is computed with powheg +pythia 8 for both the ggH and VBF processes, and with pythia 6 for VH and \({t}\overline{{t}}{H} \) processes. It amounts to less than 1 % for all bins, and integrates to a 0.8 % effect in the fiducial cross section.

7.2 Results

The fiducial cross section, inclusive in the number of jets, is measured to be:

$$\begin{aligned} \sigma _{\text {obs}} = 32^{+10}_{-10}\,\text {(stat)} ^{+3}_{-3}\,\text {(syst)} \text {\,fb}, \end{aligned}$$

where the uncertainties reflect statistical and systematic contributions added in quadrature. This can be compared with the following SM predictions:

$$\begin{aligned} \sigma _{{\textsc {hres}} +\mathrm {XH}}&= 31 ^{+4}_{-3}\text {\,fb},\\ \sigma _{{\textsc {powheg}} +\mathrm {XH}}&= 32 ^{+6}_{-5}\text {\,fb},\\ \sigma _{{\textsc {madgraph5}\_{a}\textsc {mc@nlo}} +\mathrm {XH}}&= 30 ^{+6}_{-5}\text {\,fb}. \end{aligned}$$

Uncertainties in the predicted cross sections include contributions from renormalization and factorization scales, choice of PDF and branching fraction. The hres also includes uncertainties from nonperturbative corrections.

The observed fiducial cross section agrees with the predicted values. The measurement precision is dominated by statistical uncertainties. The relative systematic uncertainty of 9 % is almost negligible relative to the statistical uncertainty of 30 %. The experimental uncertainty is larger than the theoretical one by a factor up to about two. The ratio of the measured cross section to the predictions for \({\textsc {powheg}} \)+\(\mathrm {XH}\) is in good agreement with the signal strength observed in Ref. [12].

The measured differential cross sections observed in data, given for each bin by \(\mu _i N^{\text {gen}}_{i}\), are compared with predictions for inclusive production in Fig. 4, and for jet observables in Figs. 5 and 6. The total theoretical uncertainty included in these comparisons is computed by adding in an uncorrelated way the uncertainties in the choice of PDF, renormalization and factorization scale, and the branching fraction. The uncertainties in the ggH mechanism, PDF choice, and the renormalization and factorization scales are computed with hres, powheg, powheg+minlo and madgraph5_amc@nlo, as described above, while the branching fraction for all production mechanisms, as well as the scale and PDF for the associated production mechanisms are taken from Ref. [20]. Distributions for inclusive observables computed with hres, powheg, and madgraph5_amc@nlo, and jet-related observables computed with powheg, powheg+minlo and madgraph5_amc@nlo, including latest higher-order corrections, are compatible within their uncertainties, and also compatible with the data.

Fig. 4
figure 4

The \({H} \rightarrow {\gamma }{\gamma } \) differential cross section for inclusive events as a function of (upper left) \(p_{\mathrm {T}}^{\gamma \gamma } \), (upper right) \(|y^{\gamma \gamma } | \), (lower left) \(\Delta \phi ^{\gamma \gamma } \), and (lower right) \(|\cos \theta ^{*} |\). All the SM contributions are normalized to their cross section from Ref. [20]. Theoretical uncertainties in the renormalization and factorization scales, PDF, and branching fraction are added in quadrature. The error bars on data points reflect both statistical and systematic uncertainties. The last bin of \(p_{\mathrm {T}}^{\gamma \gamma } \) distribution sums the events above 200\(\,\text {GeV}\). For each graph, the bottom panel shows the ratio of data to theoretical predictions from the \({\textsc {powheg}} \) generator

Fig. 5
figure 5

The \({H} \rightarrow {\gamma }{\gamma } \) differential cross section for H+jets events as a function of (upper left) \(N_\text {jets} \), (upper right) \(p_{\mathrm {T}} ^\mathrm {j1} \), (lower left) \(|y^{\gamma \gamma }-y^{\mathrm {j1}} | \), with jets within \(|\eta |<2.5\), and (lower right) \(m_\mathrm {jj} \) with jets within \(|\eta |<4.7\). All the SM contributions are normalized to their cross section from Ref. [20]. Theoretical uncertainties in the renormalization and factorization scales, PDF, and branching fraction are added in quadrature. The error bars on data points reflect both statistical and systematic uncertainties. In each distribution, the last bin corresponds to the sum over the events beyond the bins shown in the figure. For each graph, the bottom panel shows the ratio of data to theoretical predictions from the powheg generator

Fig. 6
figure 6

The \({H} \rightarrow {\gamma }{\gamma } \) differential cross section for H+2 jets events, with jets within \(|\eta |<4.7\), as a function of (upper left) \(\Delta \phi ^{\mathrm {jj}} \), (upper right) \(\Delta \eta ^{\mathrm {jj}} \), (lower left) Zeppenfeld variable, and (lower right) \(\Delta \phi ^{\gamma \gamma ,\mathrm {jj}} \). All the SM contributions are normalized to their cross section from Ref. [20]. Theoretical uncertainties in the renormalization and factorization scales, PDF, and branching fraction are added in quadrature. The error bars on data points reflect both statistical and systematic uncertainties. In the Zeppenfeld distribution, the last bin sums the events with values above 1.2. For each graph, the bottom panel shows the ratio of data to theoretical predictions from the \({\textsc {powheg}} \) generator

Figure 4 (upper left) shows the \(p_{\mathrm {T}} \) distribution of the Higgs boson, which is sensitive to higher-order corrections in perturbative QCD. Figure 4 (upper right) shows the absolute rapidity distribution of the Higgs boson, which is sensitive to the proton PDF, as well as to the production mechanism. Figure 4 (lower left) shows the \(\Delta \phi ^{\gamma \gamma } \) distribution. Figure 4 (lower right) displays the \(\cos \theta ^{*}\) distribution, which is sensitive to the spin of the Higgs boson. The two spin-2 samples indicate deviations relative to the SM predictions. As in the case of Ref. [12], the data do not have sufficient sensitivity to discriminate between spin-2 and spin-0 hypotheses.

Figure 5 (upper left) shows the \(p_{\mathrm {T}} \) distribution for the leading jet, which is sensitive to higher-order QCD effects, and Fig. 5 (upper right) shows the rapidity difference between the Higgs boson and the leading jet. The distribution in the number of jets is displayed in Fig. 5 (lower left). The last bin gives the cross section for signal events with at least three jets. The distribution of the dijet mass shown in Fig. 5 (lower right) is sensitive to the VBF production mechanism, and especially to a possible anomalous electroweak production in the tail of the distribution. Large contributions from new processes modifying triple gauge couplings would be detected as excesses either in \(p_{\mathrm {T}}^{\gamma \gamma } \), \(p_{\mathrm {T}} ^\mathrm {j1} \) or in \(m_\mathrm {jj} \) distributions. The distributions in data are compatible with expectations from the SM within their uncertainties. Figure 6 (upper left) shows the difference in azimuthal angle \(\Delta \phi ^{\mathrm {jj}} \) between the two jets of highest \(p_{\mathrm {T}} \). Figures 6 (upper right), 6 (lower left), and 6 (lower right) show, respectively, the distribution of the rapidity difference between the two jets, the Zeppenfeld variable, and the azimuthal difference between the Higgs boson and the dijet system. These angular variables are sensitive to the VBF topology, and large contributions from anomalous couplings are not observed in data. The distributions in data are compatible with SM predictions within their statistical, systematic, and theoretical uncertainties.

8 Summary

A measurement was carried out of differential cross sections as a function of kinematic observables in the \({H} \rightarrow {\gamma }{\gamma }\) decay channel, using data collected by the CMS experiment at \(\sqrt{s}=8\,\text {TeV} \), corresponding to an integrated luminosity of 19.7\(\,\text {fb}^\text {-1}\). The measurement was performed for events with two isolated photons in the kinematic range \(p_{\mathrm {T}} ^{\gamma 1}/m_{\gamma \gamma } >1/3\) and \(p_{\mathrm {T}} ^{\gamma 2}/m_{\gamma \gamma } >1/4\), with photon pseudorapidities within \(|\eta |<2.5\). Photon identification was chosen to reduce the dependence of the measurement on the kinematics of the signal. Event classification relied on an estimator of diphoton mass resolution. The signal extraction and the unfolding of experimental resolution were performed simultaneously in all bins of the chosen observables. In this kinematic range, the fiducial cross section was measured to be \(32 \pm 10\) \(\text {\,fb}\).

The differential cross section of the Higgs boson was measured, inclusively in the number of jets, as a function of its transverse momentum \(p_{\mathrm {T}}^{\gamma \gamma } \), its rapidity \(|y^{\gamma \gamma } | \), the Collins–Soper angular variable \(\cos \theta ^{*}\), the difference in azimuthal angle between the two photons \(\Delta \phi ^{\gamma \gamma } \), and the number of associated jets \(N_\text {jets} \). The transverse momentum of the leading jet \(p_{\mathrm {T}} ^\mathrm {j1} \), and the difference in rapidity between the Higgs boson and the leading jet \(|y^{\gamma \gamma }-y^{\mathrm {j1}} | \) were determined in events with at least one accompanying jet. In events with at least two jets, measurements were made of the dijet mass \(m_\mathrm {jj} \), the azimuth between the two jets \(\Delta \phi ^{\mathrm {jj}} \), the pseudorapidity difference between the two jets \(\Delta \eta ^{\mathrm {jj}} \), the Zeppenfeld variable \(|\eta ^{\gamma \gamma }-(\eta ^{\mathrm {j1}}+\eta ^{\mathrm {j2}})/2 | \), and the azimuthal angle between the Higgs boson and the dijet system \(\Delta \phi ^{\gamma \gamma ,\mathrm {jj}} \). The differential cross sections were compared with several SM and beyond SM calculations, and found to be compatible with the SM predictions within statistical, systematic, and theoretical uncertainties. With more data, anomalous couplings of the Higgs boson could be measured from its differential distributions. It would be possible to discriminate among different SM predictions for Higgs boson production those that are providing the best description.

Table 1 Values of the \(\mathrm {p}\mathrm {p}\rightarrow {H} \rightarrow {\gamma }{\gamma } \) differential cross sections as a function of kinematic observables as measured in data and as predicted in SM simulations. For each observable the fit to \(m_{\gamma \gamma }\) is performed simultaneously in all the bins. Since the signal mass is profiled for each observable, the best fit \(\hat{m}_{H} \) varies from observable to observable