1 Introduction

This paper presents a search for massive resonances decaying into a W and a standard model (SM) Higgs boson (H) [14] in the \(\ell \nu \mathrm{b} {\bar{\mathrm{b}}}\) (\(\ell = \mathrm {e}\), \(\mu \)) final state. Such processes are distinctive features of several extensions of the SM such as composite Higgs [57], SU(5)/SO(5) Littlest Higgs (LH) [811], technicolor [12, 13], and left-right symmetric models [14]. These models provide solutions to the hierarchy problem and predict new particles including additional gauge bosons such as a heavy \({\mathrm{W}^{\prime }}\). The \({\mathrm{W}^{\prime }}\) in these models can have large branching fractions to WH and WZ, while the decays to fermions can be suppressed. The recently proposed heavy vector triplet (HVT) model [15] generalizes a large class of specific models that predict new heavy spin-1 vector bosons. In this model, the resonance is described by a simplified Lagrangian in terms of a small number of parameters representing its mass and couplings to SM bosons and fermions.

For a \({\mathrm{W}^{\prime }}\) with SM couplings to fermions and thus reduced decay branching ratio to SM bosons, the most stringent limits on production cross sections are reported in searches with leptonic final states [16, 17]. The current lower limit on the \({\mathrm{W}^{\prime }}\) mass is 3.3\(~\text {TeV}\). In the same context, searches for a \({\mathrm{W}^{\prime }}\) decaying into a pair of SM vector bosons (WZ) [1821] provide a lower mass limit of 1.7\(~\text {TeV}\). In the context of a HVT model with reduced couplings to fermions (HVT model B), the most stringent limit of 1.7\(~\text {TeV}\) on the \({\mathrm{W}^{\prime }}\)/\(\mathrm{Z}^{\prime } \) mass is set by a search for \({\mathrm{W}}^\prime /{\mathrm{Z}}^\prime \rightarrow \mathrm{WH}/\mathrm{ZH} \rightarrow \mathrm{q} {\bar{\mathrm{q}}} \mathrm{b} {\bar{\mathrm{b}}}\) [22]. The same model is used to interpret the results of a search for \(\mathrm{W}^\prime /\mathrm{Z}^\prime \rightarrow \mathrm{WH}/\mathrm{ZH} \rightarrow \ell \nu /\ell \ell /\nu \nu +\mathrm{b} {\bar{\mathrm{b}}}\) [23]. A lower limit on the \({\mathrm{W}^{\prime }}\) mass of 1.5\(~\text {TeV}\) is set in the same final state reported in Ref. [23]. Finally, a specific search for \(\mathrm{Z}^\prime \rightarrow \mathrm{ZH} \rightarrow \mathrm{q}{\bar{\mathrm{q}}} \tau ^{+}\tau ^{-}\) was reported in Ref. [24] and interpreted in the context of the same HVT model B.

This analysis is based on proton–proton collision data at \(\sqrt{s}=8\) TeV collected by the CMS experiment at the CERN LHC during 2012, corresponding to an integrated luminosity of 19.7\(\,\text {fb}^{{-1}}\). The signal considered is the production of a resonance with mass above 0.8\(~\text {TeV}\) decaying into WH, where the Higgs boson decays into a bottom quark–antiquark pair and the W boson decays into a charged lepton and a neutrino (Fig. 1). It is assumed that the resonance is narrow, i.e. that its intrinsic width is much smaller than the experimental resolution.

Fig. 1
figure 1

Production of a resonance decaying into WH

The search strategy is closely related to the search for high mass WW resonances in the \(\ell \nu \mathrm{q} {\bar{\mathrm{q}}}\) final state, described in Ref. [25], with the addition of b tagging techniques. We search for resonances in the invariant mass of the WH system on top of a smoothly falling background distribution, where the background mainly comprises events involving pair produced top quarks (\(\mathrm{t}\overline{\mathrm{t}}\)) or a W boson produced in association with jets (W+jets). For the resonance mass range considered, the two quarks from the Higgs boson decay would be separated by a small angle, resulting in the detection of a single jet after hadronization. This jet is tagged as coming from a Higgs boson through the estimation of its invariant mass, application of jet substructure techniques [26], and use of specialized b tagging techniques for high transverse momentum (\(p_{\mathrm {T}} \)) Higgs bosons [27].

The results of this analysis are also combined with two previous results [22, 24] to obtain a further improvement in sensitivity.

2 CMS detector

The central feature of the CMS apparatus is a superconducting solenoid of 6\(\text {\,m}\) internal diameter, providing a field of 3.8\(\text {\,T}\). Within the field volume are a silicon pixel and strip tracker, a crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadronic calorimeter (HCAL). The CMS tracker consists of 1440 silicon pixel and 15 148 silicon strip detector modules covering a pseudorapidity range of \(|\eta |< 2.5\). The ECAL consists of nearly 76 000 lead tungstate crystals, which provide coverage of \(|\eta |< 1.48\) in the central barrel region and \(1.48 <|\eta | < 3.00\) in the two forward endcap regions. The HCAL consists of a sampling calorimeter [28], which utilizes alternating layers of brass as an absorber and plastic scintillator as an active material, covering the range \(|\eta |< 3\), and is extended to \(|\eta |< 5\) by a forward hadron calorimeter. Muons are measured in the range \(|\eta |< 2.4\) with detection planes which employ three technologies: drift tubes, cathode strip chambers, and resistive-plate chambers. The muon trigger combines the information from the three sub-detectors with a coverage up to \(|\eta |<2.1\). A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [28].

3 Simulated samples

For the modelling of the background we use the MadGraph v5.1.3.30 [29] event generator to simulate the production of W boson and Drell–Yan events in association with jets, the powheg 1.0 r1380 [3035] package to generate \(\mathrm{t}\overline{\mathrm{t}}\) and single top quark events, and pythia v6.424 [36] for diboson (WW, WZ, and ZZ) processes. All simulated event samples are generated using the CTEQ6L1 [37] parton distribution functions (PDF) set, except for the powheg \(\mathrm{t}\overline{\mathrm{t}}\) sample, for which the CT10 PDF set [38] is used. All the samples are then processed further by pythia, using the Z2* tune [39, 40] for simulation of parton showering and subsequent hadronization, and for simulation of the underlying event. The passage of the particles through the CMS detector is simulated using the Geant4 package [41]. All simulated background samples are normalized to the integrated luminosity of the recorded data, using inclusive cross sections determined at next-to-leading order, or next-to-next-to-leading order when available, calculated with mcfm v6.6 [4245] and fewz v3.1 [46], except for the \(\mathrm{t}\overline{\mathrm{t}}\) sample, for which Top++ v2.0 [47] is used.

To simulate the signature of interest, we use a model of a generic narrow spin-1 \({\mathrm{W}^{\prime }}\) resonance implemented with MadGraph. We verified that the kinematic distributions agree with those predicted by implementations of the LH, composite Higgs and HVT models in MadGraph. The resonance width differs in the three models, but in each case it is found to be negligible with respect to the experimental resolution. More details on the parameters used for interpretation of the models are given in Sect. 8.

Extra proton–proton interactions are combined with the generated events before detector simulation to match the observed distribution of the number of additional interactions per bunch crossing (pileup). The simulated samples are also corrected for observed differences between data and simulation in the efficiencies of the lepton trigger [16], the lepton identification/isolation [16], and the selection criteria identifying jets originating from hadronization of bottom quarks (b-tagged jets) [27].

4 Reconstruction and selection of events

4.1 Trigger and basic event selection

Candidate events are selected during data taking using single-lepton triggers, which require either one electron or one muon without isolation requirements. For electrons the minimum transverse momentum \(p_{\mathrm {T}}\) measured at the high level trigger is 80\(~\text {GeV}\), while for muons the \(p_{\mathrm {T}}\) must be greater than 40\(~\text {GeV}\).

After trigger selection, all events are required to have at least one primary-event vertex reconstructed within a 24\(\text {\,cm}\) window along the beam axis, with a transverse distance from the nominal pp interaction region of less than 2\(\text {\,cm}\)  [48]. If more than one identified vertex passes these requirements, the primary-event vertex is chosen as the one with the highest sum of \(p_{\mathrm {T}} ^{2}\) over its constituent tracks.

Individual particle candidates are reconstructed and identified using the CMS particle-flow (PF) algorithm [49, 50], by combining information from all subdetector systems. The reconstructed PF candidates are each assigned to one of the five candidate categories: electrons, muons, photons, charged hadrons, and neutral hadrons.

4.2 Lepton reconstruction and selection

Electron candidates are reconstructed by clustering the energy deposits in the ECAL and then matching the clusters with reconstructed tracks [51]. In order to suppress the multijet background, electron candidates must pass quality criteria tuned for high-\(p_{\mathrm {T}}\) objects and an isolation selection [52]. The total scalar sum of the \(p_{\mathrm {T}}\) over all the tracks in a cone of radius \(\varDelta R = \sqrt{{(\varDelta \eta )^2+(\varDelta \phi )^2} } = 0.3\) around the electron direction, excluding tracks within an inner cone of \(\varDelta R = 0.04\) to remove the contribution from the electron itself, must be less than 5 \(~\text {GeV}\). A calorimetric isolation parameter is calculated by summing the energies of reconstructed deposits in both the ECAL and HCAL, not associated with the electron itself, within a cone of radius \(\varDelta R = 0.3\) around the electron. The veto threshold for this isolation parameter depends on the electron kinematic quantities and the average amount of additional energy coming from pileup interactions, calculated for each event. The electron candidates are required to have \(p_{\mathrm {T}} > 90\) \(~\text {GeV}\) and \(|\eta | < 1.44\) or \(1.57<|\eta |<2.5\), thus excluding the transition region between ECAL barrel and endcaps.

Muons are reconstructed with a global fit using both the tracker and muon systems [53]. An isolation requirement is applied in order to suppress the background from multijet events in which muons are produced in the semileptonic decay of B hadrons. A cone of radius \(\varDelta R = 0.3\) is constructed around the muon direction. Muon isolation requires that the scalar \(p_{\mathrm {T}}\) sum over all tracks originating from the interaction vertex within the cone, excluding the muon itself, is less than 10 % of the \(p_{\mathrm {T}}\) of the muon. The muon candidates are required to have \(p_{\mathrm {T}} > 50\) \(~\text {GeV}\) and \(|\eta | < 2.1\) in each selected event.

Events are required to contain exactly one lepton candidate (electron or muon). That is, events are rejected if they contain a second lepton candidate with \(p_{\mathrm {T}} > 35\) \(~\text {GeV}\) (electrons) or \(p_{\mathrm {T}} > 20\) \(~\text {GeV}\) (muons).

4.3 Jets and missing transverse momentum reconstruction

Hadronic jets are identified by clustering PF candidates, using the FastJet v3.0.1 software package [54]. In the jet-clustering procedure, charged PF candidates associated with pileup vertices are excluded, to reduce contamination from pileup. In order to identify a Higgs boson decaying into bottom quarks, jets are clustered using the Cambridge–Aachen algorithm [55] with a distance parameter of 0.8 (“CA8 jets”). Only the highest \(p_{\mathrm {T}}\) CA8 jet is used. Jets in the event are also identified using the anti-\(k_{\mathrm {T}}\) jet-clustering algorithm [56] with a distance parameter of 0.5 (“AK5 jets”). AK5 jets are required to be separated from the CA8 jet by \(\varDelta R > 0.8\). An event-by-event correction based on the projected area of the jet on the front face of the calorimeter is used to remove the extra energy deposited in jets by neutral particles coming from pileup. Furthermore, jet energy corrections are applied, based on measurements in dijet and photon+jet events in data [57]. Additional quality criteria are applied to the jets in order to remove spurious jet-like features originating from calorimeter noise [58]. The CA8 (AK5) jets are required to be separated from the selected electron or muon candidate by \(\varDelta R>0.8\) (0.3). Only jets with \(p_{\mathrm {T}} >30\) \(~\text {GeV}\) and \(|\eta |<2.4\) are allowed in the subsequent steps of the analysis. Furthermore, CA8 jets are not used in the analysis if their pseudorapidity falls in the region \(1.0 <|\eta |< 1.8\), thus overlapping the barrel-endcap transition region of the silicon tracker. In that region, ’noise’ can arise when the tracking algorithm reconstructs many fake displaced tracks associated with the jet. The simulation does not sufficiently describe the full material budget of the tracking detector in that region, thus it does not accurately describe this effect. Without this requirement, a bias can be introduced in the b tagging, jet substructure and missing transverse momentum information, making this analysis systematically prone to that noise. The probability of signal events satisfying the requirement that the pseudorapidity of the CA8 jet falls outside the region \(1.0 <|\eta |< 1.8\) is 80 % (92 %) for a resonance mass of 1.0 (2.5)\(~\text {TeV}\).

A b tagging algorithm, known as the combined secondary vertex algorithm [27, 59], is applied to reconstructed AK5 jets to identify whether they originate from bottom quarks. This method allows the identification and rejection of the \(\mathrm{t}\overline{\mathrm{t}}\) events as described in Sect. 4.6. The chosen algorithm working point provides a misidentification rate for light-parton jets of \(\sim \)1 % and an efficiency of \(\sim \)70 % [27]. The simulated events are reweighted event-by-event with the ratio of the b tagging efficiency in data and simulation, determined in a sample enriched with b-jets. The average value of the correction factor is 0.95. The same b tagging algorithm is also used to identify whether the CA8 jet comes from a Higgs boson decaying into bottom quarks, as described in Sect. 4.5.

The missing transverse momentum \(p_{\mathrm {T}} ^\text {miss}\) is defined as the magnitude of the projection on the plane perpendicular to the beams of the negative vector sum of the momenta of all the reconstructed particles in an event. The raw \(p_{\mathrm {T}} ^\text {miss}\) value is modified to account for corrections to the energy-momentum scale of all the reconstructed AK5 jets in the event. More details on the \(p_{\mathrm {T}} ^\text {miss}\) performance in CMS can be found in Refs. [60, 61]. A requirement of \(p_{\mathrm {T}} ^\text {miss} > 80\,(40)\) \(~\text {GeV}\) is applied for the electron (muon) channel. The higher threshold for the electron channel is motivated by the higher contribution from the multijet background expected in the low-\(p_{\mathrm {T}} ^\text {miss}\) range due to jets misidentified as electrons. The background is expected to be negligible in the muon channel, for which a lower \(p_{\mathrm {T}} ^\text {miss}\) threshold can be used to preserve a higher efficiency for a low-mass signal.

4.4 The \(\mathrm {W}\rightarrow \ell \nu \) reconstruction and identification

The identified electron or muon is associated with the \(\mathrm {W}\rightarrow \ell \nu \) candidate. The \(p_{\mathrm {T}}\) of the undetected neutrino is assumed to be equal to the \(p_{\mathrm {T}} ^\text {miss}\). The longitudinal component \(p_{z,\nu }\) of the neutrino momentum is calculated following a method used originally for the reconstruction of the invariant mass of the top quark as described in Ref. [62]. The method aims to solve a quadratic equation that makes use of the known W boson mass. Kinematic ambiguities in the solution of the equation are resolved as in Ref. [62]. The four-momentum of the neutrino is used to build the four-momentum of the \(\mathrm {W}\rightarrow \ell \nu \) candidate.

4.5 The \(\mathrm{H} \rightarrow \mathrm{b} \overline{\mathrm{b}} \) identification using jet substructure and b tagging

The CA8 jets are used to reconstruct the jet candidates from decays of Lorentz-boosted Higgs boson to bottom quarks. We exploit two techniques to discriminate against quark and gluon jets from the multijet background, including the requirement that the reconstructed jet mass be close to the Higgs boson mass, and b tagging methods that discriminate jets originating from the b quarks from those originating from lighter quarks or gluons.

First, we apply a jet-grooming technique [26, 63] to re-cluster the jet constituents, while applying additional requirements to remove possible contamination from soft QCD radiation or pileup. Different jet-grooming algorithms have been explored at CMS, and their performance on jets in multijet processes has been studied in detail [63]. In this analysis, we use the jet pruning algorithm [64, 65], which re-clusters each jet starting from all its original constituents using the CA algorithm iteratively, while discarding soft and large-angle recombinations at each step. The performance of the algorithm depends on the two parameters, \(z_\text {cut}=0.1\) and \(D_\text {cut}=m_{\text {jet}}/{p_{\text {T}}^{\text {jet}}}\), which define the maximum allowed hardness and the angle of the recombinations in the clustering algorithm, respectively. A jet is considered as an H-tagged jet candidate if its pruned mass, \(m_{\text {jet}}\), computed from the sum of the four-momenta of the constituents surviving the pruning, falls in the range \(110<m_{\text {jet}}<135\) \(~\text {GeV}\). The \(m_{\text {jet}}\) window is the result of an optimization based on signal sensitivity and on the constraints due to the higher bounds of the signal regions of other diboson analyses [25].

The simulation modelling of the pruned mass measurement for merged jets from heavy bosons has been checked using merged \(\mathrm {W}\rightarrow \overline{\mathrm{q}}\mathrm {q}'\) decays in \(\mathrm{t}\overline{\mathrm{t}} \) events with a \(\ell \)+jets topology [26]. The data are compared with \(\mathrm{t}\overline{\mathrm{t}} \) events generated with MadGraph, interfaced to pythia for parton showering. The differences between recorded and simulated event samples in the pruned jet mass scale and resolution are found to be up to 1.7 and 11 %, respectively. In addition, the modelling of bottom quark fragmentation is checked through reconstruction of the top quark mass in these \(\mathrm{t}\overline{\mathrm{t}} \) events [66].

To discriminate between quark and gluon jets, on one hand, and a Higgs-initiated jet, on the other, formed by the hadronization of two bottom quarks, we use a H tagging technique [27]. This procedure splits the candidate H-jet into two sub-jets by reversing the last step of the CA8 pruning recombination algorithm. Depending on the angular separation \(\varDelta R\) of the two sub-jets, different b tagging discriminators are used to tag the H-jet candidate. If \(\varDelta R>0.3\), then the b tagging algorithm is applied to both of the individual sub-jets of the CA8 jet; otherwise, it is applied to the whole CA8 jet. The chosen algorithm working point provides a misidentification rate of 10 % and an efficiency of 80 %. The ratio of the b tagging efficiency between data and simulation, in a sample enriched with b-jets from gluon splitting by requiring two muons within the CA8 jet, is used to reweight the simulated events.

4.6 Final event selection and categorization

After reconstructing the W and Higgs bosons, we apply the final selections used for the search. Both the W and Higgs boson candidates must have a \(p_{\mathrm {T}}\) greater than 200\(~\text {GeV}\). In addition, we apply topological selection criteria, requiring that the W and Higgs bosons are approximately back-to-back, since they tend to be isotropically distributed for background events. In particular, the \(\varDelta R\) distance between the lepton and the H-tagged jet must be greater than \(\pi /2\), the azimuthal angular separation between the \(p_{\mathrm {T}} ^\text {miss}\) and the H-tagged jet must be greater than 2.0 radians, and the azimuthal angular separation between the \(\mathrm {W}\rightarrow \ell \nu \) and H-tagged jet candidates must be greater than 2.0 radians. To further reduce the level of the \(\mathrm{t}\overline{\mathrm{t}}\) background, events with one or more reconstructed AK5 jets, not overlapping with the CA8 H-tagged jet candidate as described previously in Sect. 4.3, are analyzed. If one or more of the AK5 jets is b-tagged, the event is rejected. Furthermore, a leptonically decaying top quark candidate mass \(m_\text {top}^\ell \) is reconstructed from the lepton, \(p_{\mathrm {T}} ^\text {miss}\), and the closest AK5 jet to the lepton using the method described in Ref. [62]. A hadronically decaying top quark candidate mass \(m_\text {top}^\mathrm {h}\) is reconstructed from the CA8 H-tagged jet candidate and the closest AK5 jet. Events with \(120<m_\text {top}^\ell <240\) \(~\text {GeV}\) or \(160<m_\text {top}^\mathrm {h}<280\) \(~\text {GeV}\) are rejected. The chosen windows around the top quark mass are the result of an optimization carried out in this analysis, taking into account the asymmetric tails at larger values due to combinatorial background. If several distinct WH resonance candidates are present in the same event, only the candidate with the highest-\(p_{\mathrm {T}}\) H-tagged jet is kept for further analysis. The invariant mass of the WH resonance (\(\text{ M }_{\mathrm {W} \mathrm{H} }\)) is required to be at least 0.7\(~\text {TeV}\). The signal efficiency for the full event selection ranges between \(\sim \)3 and \(\sim \)9 %, depending on the resonance mass.

5 Modelling of background and signal

5.1 Background estimation

After the full event selection, the two dominant remaining backgrounds are expected to come from W+jets and \(\mathrm{t}\overline{\mathrm{t}}\) events. Backgrounds from \(\mathrm{t}\overline{\mathrm{t}}\), single top quark, and diboson production are estimated using simulated samples after applying correction factors derived from control samples in data. For the W+jets background estimation, a procedure based on data has been developed to determine both the normalization and the \(\text{ M }_{\mathrm {W} \mathrm{H} }\) shape.

For the W+jets normalization estimate, a signal-depleted control region is defined outside the \(m_{\text {jet}}\) mass window described in Sect. 4.5. A lower sideband region is defined in the \(m_{\text {jet}}\) range [40, 110]\(~\text {GeV}\) as well as an upper sideband in the range [135, 150]\(~\text {GeV}\). The overall normalization of the W+jets background in the signal region is determined from the likelihood of the sum of backgrounds fit to the \(m_{\text {jet}}\) distribution in both sidebands of the observed data. In this approach, simulated events are replaced by an analytical function, which has been determined individually for each background process. Figure 2 shows the result of this fit procedure, where all selections are applied except the final \(m_{\text {jet}}\) signal window requirement. The inclusive W+jets background is predicted from a fit excluding the signal region (between the vertical dashed lines), while the other backgrounds are estimated from simulation.

Fig. 2
figure 2

Distributions of the pruned jet mass, \(m_{\text {jet}}\), in the electron (top) and muon (bottom) channels. The signal region lies between the dashed vertical lines. The hatched region indicates the statistical uncertainty of the fit. At the bottom of each plot, the bin-by-bin fit residuals, \((\text {Data}-\text {Fit})/\sigma _\text {data}\), are shown

The shape of the W+jets background as a function of \(\text{ M }_{\mathrm {W} \mathrm{H} }\) in the signal region is estimated using the lower sideband region of the \(m_{\text {jet}}\) distribution. Correlations needed to extrapolate from the sideband to the signal region are determined from simulation through an extrapolation function defined as:

$$\begin{aligned} \alpha _\mathrm {MC}(\text{ M }_{\mathrm {W} \mathrm{H} }) = \frac{F_\mathrm {MC, SR}^{\mathrm {W}+\text {jets}}(\text{ M }_{\mathrm {W} \mathrm{H} })}{F_\mathrm {MC, SB}^{\mathrm {W}+\text {jets}}(\text{ M }_{\mathrm {W} \mathrm{H} })}, \end{aligned}$$
(1)

where \(F_\mathrm {MC, SR}^{\mathrm {W}+\text {jets}}\) and \(F_\mathrm {MC, SB}^{\mathrm {W}+\text {jets}}\) are the probability density functions determined from the \(\text{ M }_{\mathrm {W} \mathrm{H} }\) spectrum in simulation for the signal region and low-\(m_{\text {jet}}\) sideband region, respectively.

In order to estimate the W+jets contribution \(F_{\text {DATA}, \mathrm {SB}}^{\text {W+jets}}\) in the control region of the data the other backgrounds are subtracted from the observed \(\text{ M }_{\mathrm {W} \mathrm{H} }\) distribution in the lower sideband region. The shape of the W+jets background distribution in the signal region is obtained by scaling \(F_{\text {DATA}, \mathrm {SB}}^{\mathrm {W}+\text {jets}}\) according to \(\alpha _\mathrm {MC}\). The final prediction of the background contribution in the signal region, \(N^\text {BKGD}_\mathrm {SR}\), is given by

$$\begin{aligned} N^\mathrm {BKGD}_\mathrm {SR}(\text{ M }_{\mathrm {W} \mathrm{H} })= & {} C_{\mathrm {SR}}^{\mathrm {W}+\text {jets}}\, F_{\text {DATA}, \mathrm {SB}}^{\mathrm {W}+\text {jets}}(\text{ M }_{\mathrm {W} \mathrm{H} })\, \alpha _\mathrm {MC}(\text{ M }_{\mathrm {W} \mathrm{H} })\nonumber \\&+ \sum _{k} C_{\mathrm {SR}}^{k}~F_\mathrm {MC, SR}^{k}(\text{ M }_{\mathrm {W} \mathrm{H} }), \end{aligned}$$
(2)

where the index k runs over the list of minor backgrounds, and \(C_{\mathrm {SR}}^{\mathrm {W}+\text {jets}}\) and \(C_{\mathrm {SR}}^{k}\) represent the normalizations of the yields of the dominant W+jets background and of the different minor background contributions. The \(C_{\mathrm {SR}}^{\mathrm {W}+\text {jets}}\) parameter is determined from the fit to the \(m_{\text {jet}}\) distribution as described above, while each \(C_{\mathrm {SR}}^{k}\) is determined from simulation. The ratio \(\alpha _\mathrm {MC}\) accounts for the small kinematic differences between signal and sideband regions, and is largely independent of the assumptions on the overall cross section. The validity and robustness of this method have been studied in data using a lower \(m_{\text {jet}}\) sideband of [40, 80]\(~\text {GeV}\) to predict an alternate signal region with \(m_{\text {jet}}\) in the range [80, 110]\(~\text {GeV}\). Both the normalization and shape of the W+jets background are successfully estimated for the alternate signal region. This alternate signal region differs from the signal region of the search for WW or WZ resonances in Ref. [25] as b tagging is applied to the CA8 jet. We are therefore able to evaluate the potential WW and WZ signal contamination in the alternate signal region and find less than 5 % signal contamination, assuming a signal cross section corresponding to the exclusion limit for a WW resonance from Ref. [25]. The \(\text{ M }_{\mathrm {W} \mathrm{H} }\) distribution of the background in the signal and lower sideband regions is described analytically by a function defined as \(f(x)\propto \exp [-x/(c_0+c_{1}x)]\), which is found to describe the simulation well. Alternative fit functions have been studied but in all cases the background shapes agree with that of the default function within uncertainties.

For the \(\mathrm{t}\overline{\mathrm{t}}\) background estimate, a control sample is selected by applying all analysis requirements, except that the b-tagged jet veto is inverted, the veto on the top quark mass is dropped, and the \(m_{\text {jet}}\) requirement is removed. The data are compared with the predictions from simulation and good agreement is found. The pruned jet mass distribution in the top quark enriched control sample is shown in Fig. 3. The pruned jet mass distribution shows a small peak due to isolated W boson decays into hadrons, along with a smoothly varying combinatorial component mainly due to events in which the extra b-tagged jet from the top quark decay is in the proximity of the W boson. The difference in normalization between data and simulation is found to be \(4.6 \pm 5.6\) %, where the quoted uncertainty is only statistical. This normalization difference is applied to correct the normalization of \(\mathrm{t}\overline{\mathrm{t}}\) background in the signal region. The relative uncertainty of 5.6 % is used to quantify the uncertainty in the \(\mathrm{t}\overline{\mathrm{t}}\) and single top quark background normalization, as described in Sect. 6.1.

Fig. 3
figure 3

Distributions of \(m_{\text {jet}}\) in the top quark enriched control sample in the electron (top) and muon (bottom) channels. The hatched region indicates the overall uncertainty in the background. In the lower panels, the bin-by-bin residuals, \((\text {Data}-\mathrm {MC})/\sigma \) are shown, where \(\sigma \) is the sum in quadrature of the statistical uncertainty of the data, the simulation, and the systematic uncertainty in the \(\mathrm{t}\overline{\mathrm{t}}\) background

5.2 Modelling of the signal mass distribution

The shape of the reconstructed signal mass distribution is extracted from the simulated signal samples. In the final analysis of the \(\text{ M }_{\mathrm {W} \mathrm{H} }\) spectrum, the statistical signal sensitivity depends on an accurate description of the signal shape. The signal shape is parametrized with a double-sided Crystal Ball function (i.e. a Gaussian core with power-law tails on both sides) [67] to describe the CMS detector resolution. Figure 4 shows an example of this parametrization for a \({\mathrm{W}^{\prime }}\) mass of 1.5\(~\text {TeV}\). To take into account differences between the electron and muon \(p_{\mathrm {T}}\) resolutions at high \(p_{\mathrm {T}}\), the signal mass distribution is parametrized separately for events with electrons and muons. The resolution of the reconstructed \(\text{ M }_{\mathrm {W} \mathrm{H} }\) is given by the width of the Gaussian core and is found to be 4–6 %.

Fig. 4
figure 4

Final distributions in \(\text{ M }_{\mathrm {W} \mathrm{H} }\) for data and expected backgrounds for electron (top) and muon (bottom) categories. The 68 % error bars for Poisson event counts are obtained from the Neyman construction [77]. The hatched region indicates the statistical uncertainty of the fit combined with the systematical uncertainty in the shape. This figure also shows a hypothetical \({\mathrm{W}^{\prime }}\) signal with mass of 1.5\(~\text {TeV}\), normalized to the cross section predicted by the HVT model B with parameter \(g_\mathrm {V} =3\) as described in Sect. 8.2

6 Systematic uncertainties

6.1 Systematic uncertainties in the background estimation

Uncertainties in the estimation of the background affect both the normalization and the shape of the \(\text{ M }_{\mathrm {W} \mathrm{H} }\) distribution. The systematic uncertainty in the W+jets background yield is dominated by the statistical uncertainty associated with the number of events in data in the \(m_{\text {jet}}\) sideband regions, and it is found to be about 59 % (42 %) in the electron (muon) channel. The systematic uncertainty in the \(\mathrm{t}\overline{\mathrm{t}}\)  normalization comes from the data-to-simulation ratio derived in the top-quark-enriched control sample (5.6 %) as described in Sect. 5.1. The systematic uncertainties in the WW, \(\mathrm {W}\) \(\mathrm{Z}\), and ZZ inclusive cross sections are assigned to be 10 %, taken from the relative difference in the mean value between the CMS WW cross section measurement at \(\sqrt{s}=8\) \(~\text {TeV}\) and the SM expectation [68].

Systematic uncertainties in the W+jets background shape are estimated from the covariance matrix of the fit to the extrapolated data sideband and from the uncertainties in the modelling of \(\alpha _\mathrm {MC}(\text{ M }_{\mathrm {W} \mathrm{H} })\). They are driven by the available data in the sidebands and the number of events generated for the simulation of the W+jets background, respectively. These uncertainties are shown in Fig. 4, and they are found to be about 30 % (120 %) at \(\text{ M }_{\mathrm {W} \mathrm{H} } \approx 1~\text {TeV} \) (1.8\(~\text {TeV}\)). The estimation of the systematic uncertainty in the shape of the \(\mathrm{t}\overline{\mathrm{t}}\) background takes into account the following contributions: the statistical uncertainty associated with the simulated event sample, the choices of regularization/factorization scales (varied up and down by a factor of 2), the matching scales in the MadGraph simulation, and an observed difference between MadGraph and powheg simulations.

Systematic effects from rare noise events identified in the tracker overlap region were specifically studied in the context of the acceptance requirement introduced for \(\mathrm{H}\)-jet candidates (\(|\eta | < 1.0\) or \(|\eta | > 1.8\)) as described in Sect. 4. Those studies conclude that any residual noise effects following the imposition of this requirement are negligible. No additional source of systematic uncertainty is taken into account for the background predictions.

6.2 Systematic uncertainties in the signal prediction

Systematic uncertainties in the signal prediction affect both the signal efficiency and the \(\text{ M }_{\mathrm {W} \mathrm{H} }\) shape. The primary uncertainties in signal yields are summarized in Table 1 and described below.

Table 1 Summary of the systematic uncertainties in the signal yield, relative to the expected number of events

The systematic uncertainties in the signal efficiency due to the electron energy (E) and muon \(p_{\mathrm {T}}\) scales are evaluated by varying the lepton E or \(p_{\mathrm {T}}\) within one standard deviation of the corresponding uncertainty [51, 53]; the uncertainties due to the electron E and muon \(p_{\mathrm {T}}\) resolutions are estimated applying a \(p_{\mathrm {T}}\) and E smearing, respectively. In this process, variations in the lepton E or \(p_{\mathrm {T}}\) are propagated consistently to the \(p_{\mathrm {T}} ^\text {miss}\) vector. We also take into account the systematic uncertainties affecting the observed-to-simulated scale factors for the efficiencies of the lepton trigger, identification and isolation requirements. These efficiencies are derived using a specialized tag-and-probe analysis with \(\mathrm{Z}\rightarrow \ell ^{+}\ell ^{-}\) events [69], and the uncertainty in the ratio of the efficiencies is taken as the systematic uncertainty. The uncertainties in the efficiencies of the electron (muon) trigger and the electron (muon) identification with isolation are 3 % (3 %) and 3 % (4 %), respectively.

The signal efficiency is also affected by the uncertainties in the jet energy-momentum scale and resolution. The jet energy-momentum scale and resolution are varied within their \(p_{\mathrm {T}}\)- and \(\eta \)-dependent uncertainties [57] to estimate their impact on the signal efficiency. The variations are also propagated consistently to the \(p_{\mathrm {T}} ^\text {miss}\) vector.

The momentum scale uncertainty of particles that are not identified as leptons or clustered in jets (‘unclustered energy-momentum’) is found to introduce an uncertainty of less than 0.5 % in the signal efficiency.

We also include systematic uncertainties in the signal efficiency due to uncertainties in data-to-simulation scale factors for the pruned jet mass tagging, derived from the top quark enriched control sample [26] and b-tagged jet identification efficiencies [27]. These sources introduce a systematic uncertainty in the mass tagging and b tagging of the Higgs boson of 2–10 % and 2–8 %, respectively, depending on the signal mass.

The systematic uncertainty due to the modelling of pileup is estimated by reweighting the signal simulation samples such that the distribution of the number of interactions per bunch crossing is shifted according to the uncertainty in the inelastic proton–proton cross section [70, 71].

The impact of the proton PDF uncertainties on the signal efficiency is evaluated with the PDF4LHC prescription [72, 73], using the MSTW2008 [74] and NNPDF2.1 [75] PDF sets. The uncertainty in the integrated luminosity is 2.6 % [76].

In addition to systematic uncertainties in the signal efficiency discussed above, we consider uncertainties in the signal resonance peak position and width. The systematic effects that could change the signal shape are the uncertainties due to the \(p_{\mathrm {T}}\)/energy-momentum scale and resolution of electrons, muons, jets, and the unclustered energy-momentum scale. For each of these sources of experimental uncertainty, the energy-momentum of the lepton and jets, as well as the corresponding \(p_{\mathrm {T}} ^\text {miss}\) vector, are varied (or smeared) by their relative uncertainties. The uncertainty in the peak position of the signal is estimated to be less than 1 %. The jet energy-momentum scale and resolution introduce a relative uncertainty of about 3 % in the signal width. The unclustered energy-momentum scale introduces an uncertainty in the signal width of 1 % at lower resonance masses (\(<\)1.5\(~\text {TeV}\)), and of 3 % at higher masses.

7 Results

The predicted number of background events in the signal region after the inclusion of all backgrounds is summarized in Table 2 and compared with observations. The yields are quoted in the range \(0.7 < \text{ M }_{\mathrm {W} \mathrm{H} } < 3\) \(~\text {TeV}\). The expected background is derived with the sideband procedure. The uncertainties in the background prediction from data are statistical in nature, as they depend on the number of events in the sideband region. The muon channel has more expected background events than the electron channel owing to the lower \(p_{\mathrm {T}} ^\text {miss}\) requirement on the muon and its worse mass resolution at high \(p_{\mathrm {T}} \).

Figure 4 shows the \(\text{ M }_{\mathrm {W} \mathrm{H} }\) spectra after all selection criteria have been applied. The highest mass event is in the electron category and has \(\text{ M }_{\mathrm {W} \mathrm{H} } \approx 1.9\) \(~\text {TeV}\). The observed data and the predicted background in the muon channel agree. In the electron channel, an excess of three events is observed with \(\text{ M }_{\mathrm {W} \mathrm{H} } > 1.8\) \(~\text {TeV}\), where about 0.3 events are expected, while in the muon channel no events with \(\text{ M }_{\mathrm {W} \mathrm{H} } > 1.8\) \(~\text {TeV}\) are observed, where about 0.3 events are expected.

Table 2 Observed and expected yields in the signal region together with statistical uncertainties

8 Statistical and model interpretation

8.1 Significance of the data

A comparison between the \(\text{ M }_{\mathrm {W} \mathrm{H} }\) distribution observed in data and the largely data-driven background prediction is used to test for the presence of a resonance decaying into WH. The statistical test is performed based on a profile likelihood discriminant that describes an unbinned shape analysis. Systematic uncertainties in the signal and background yields are treated as nuisance parameters and profiled in the statistical interpretation using log-normal priors.

Fig. 5
figure 5

Local p-value of the combined electron and muon data as a function of the \({\mathrm{W}^{\prime }}\) boson mass, probing a narrow \(\mathrm {W}\mathrm{H} \) resonance

Table 3 Intrinsic total widths (\(\varGamma \)) and cross sections (\(\sigma \)) for the LH model and HVT model B for different resonance masses. The \(\mathrm {W}\mathrm{H} \rightarrow \ell \nu {\mathrm{b} \overline{\mathrm{b}} }\) branching fraction is not included in the calculation

We evaluate the local significance of the observations in the context of the described test, under the assumptions of a narrow resonance decaying into the WH final state and lepton universality for the W boson decay, by combining the two event categories. Correlations arising from the uncertainties common to both channels are taken into account. The result is shown in Fig. 5. The highest local significance of 2.2 standard deviations is found for a resonance mass of 1.8\(~\text {TeV}\), driven by the excess in the electron channel described in Sect. 7. The corresponding local significance for a resonance of 1.8\(~\text {TeV}\) in the electron channel is 2.9 standard deviations, while in the muon channel there is no significance. Taking into account the look-elsewhere effect [78], a local significance of 2.9 standard deviations translates into a global significance of about 1.9 standard deviations searching for resonances over the full mass range 0.8–2.5\(~\text {TeV}\) and across two channels. We conclude that the results are thus statistically compatible with the SM expectation within 2 standard deviations.

8.2 Cross section limits

We set upper limits on the production cross section of a new resonance following the modified-frequentist \(CL_\mathrm {s}\) method [79, 80]. Exclusion limits can be set as a function of the \({\mathrm{W}^{\prime }}\) boson mass, under the narrow-width approximation. The results are interpreted in the HVT model B [15] which mimics the properties of composite Higgs scenarios, and in the context of the little Higgs model [8]. Typical parameter values for the HVT model B are

$$\begin{aligned} |c_\mathrm{H} | \approx |c_\mathrm {F} |\approx 1, \quad g_\mathrm {V} \ge 3, \end{aligned}$$
(3)

where \(c_\mathrm{H} \) describes interactions involving the Higgs boson or longitudinally polarized SM vector bosons, \(c_\mathrm {F}\) describes the direct interactions of the \({\mathrm{W}^{\prime }}\) with fermions, and \(g_\mathrm {V} \) is the typical strength of the new interaction. In this scenario, decays of the \({\mathrm{W}^{\prime }}\) boson into a diboson are dominant and the \({\mathrm{W}^{\prime }}\rightarrow \) WH branching fraction is almost equal to that of the decay into WZ. The parameter points for this scenario are currently not well constrained from experiments [15] because of the suppressed fermionic couplings of the \({\mathrm{W}^{\prime }}\) boson.

The following parameters are used for interpretation of the results: \(g_\mathrm {V} = 3\), \(c_\mathrm{H} = -1\) and \(c_\mathrm {F} = 1\) in the HVT model B and \(\cot {2\theta } = 2.3\), \(\cot {\theta } = -0.20799\) in the LH model, where \(\theta \) is a mixing angle parameter that determines \({\mathrm{W}^{\prime }}\) couplings and that \(\cot {2\theta }\) and \(\cot {\theta }\) can be directly related to \(c_\mathrm{H} \) and \(c_\mathrm {F}\).

The intrinsic width and cross section for both models are listed in Table 3 for several resonance masses. The widths for the HVT model B are computed by means of Eqs. (2.25) and (2.31) in Ref. [15], while the cross sections were obtained using the online tools provided by the authors of Ref. [15]. The width is less than 5 % for the following parameter values: \(0.95<g_\mathrm {V} <3.76\), \(c_\mathrm{H} = -1\), and \(c_\mathrm {F} = 1\); \(g_\mathrm {V} <3.9\), \(c_\mathrm{H} = -1\), and \(c_\mathrm {F} = 0\); or \(g_\mathrm {V} <7.8\), \(c_\mathrm{H} = 0.5\), and \(c_\mathrm {F} = 0\). The widths for the LH model have been computed by means of Eq. (15) in Ref. [81], and they are less than 5 % for values of \(0.084<|\cot {\theta } |<1.21\). Hence, in both models we can consider the width to be negligible compared to the experimental resolution.

Figure 6 shows the expected and observed exclusion limits at 95 % confidence level (CL) on the product of the \({\mathrm{W}^{\prime }}\) production cross section and the branching fraction of \({\mathrm{W}^{\prime }}\rightarrow \mathrm {W}\mathrm{H} \) for the electron and muon channels separately, and for the combination of the two. For the combined channels, the observed and expected lower limits on the \({\mathrm{W}^{\prime }}\) mass are 1.4\(~\text {TeV}\) in the LH model and 1.5\(~\text {TeV}\) in the HVT model B. For the electron (muon) channel, the observed and expected lower limits on the \({\mathrm{W}^{\prime }}\) mass are 1.2 (1.3)\(~\text {TeV}\) in the LH model and 1.3 (1.3)\(~\text {TeV}\) in the HVT model B.

Fig. 6
figure 6

Observed (solid) and expected (dashed) upper limits at 95 % CL on the product of the \({\mathrm{W}^{\prime }}\) production cross section and the branching fraction of \({\mathrm{W}^{\prime }}\rightarrow \mathrm {W}\mathrm{H} \) for electron (top) and muon (middle) channels, and the combination of the two channels (lower plot). The products of cross sections and branching fractions for \({\mathrm{W}^{\prime }}\) production in the LH and HVT models are overlaid

8.3 Analysis combination

The limits obtained in this analysis can be combined with two previous results [22, 24], setting limits on the sum of \({\mathrm{W}^{\prime }}\rightarrow \mathrm {W}\mathrm{H} \) and \({\mathrm{Z}}^\prime \rightarrow \mathrm{Z}\mathrm{H} \) production in the context of the HVT model. The search for \({\mathrm{W}^{\prime }}/\mathrm{Z}^{\prime } \rightarrow \mathrm {W}\mathrm{H}/{\mathrm{Z}} \mathrm{H} \rightarrow {\mathrm{q}} '\overline{{\mathrm{q}}} \mathrm{b} \overline{\mathrm{b}} /{\mathrm{q}} \overline{{\mathrm{q}}} {\mathrm{q}} \overline{{\mathrm{q}}} {\mathrm{q}} \overline{{\mathrm{q}}} \) [22] reports limits in the context of the HVT model that can be directly used in the combination. However, while an asymptotic approximation of the \(CL_\mathrm {s}\) procedure was used in the original paper, for the combination the limit is re-evaluated with the \(CL_\mathrm {s}\) procedure reported above. The search for \(\mathrm{Z}^{\prime } \rightarrow {\mathrm{Z}} \mathrm{H} \rightarrow {\mathrm{q}} \overline{{\mathrm{q}}} \tau ^{+}\tau ^{-}\) [24], does not report limits in the context of a \({\mathrm{W}^{\prime }}\) resonance. However, since it is also sensitive to a signal from \({\mathrm{W}^{\prime }}\rightarrow \mathrm {W}\mathrm{H} \rightarrow {\mathrm{q}} '\overline{{\mathrm{q}}} \tau ^{+}\tau ^{-}\) with an efficiency of about 5 % less than for the \(\mathrm{Z}^{\prime } \) signal, it was reinterpreted for the purpose of the combination. The results of the combination are shown in Fig. 7. The limit on the mass of the \({\mathrm{W}^{\prime }}/\mathrm{Z}^{\prime } \) is slightly improved to 1.8\(~\text {TeV}\) compared to the most stringent result reported by the \({\mathrm{W}^{\prime }}/\mathrm{Z}^{\prime } \rightarrow \mathrm {W}\mathrm{H}/{\mathrm{Z}} \mathrm{H} \rightarrow {\mathrm{q}} ^\prime \overline{{\mathrm{q}}} \mathrm{b} \overline{\mathrm{b}} /{\mathrm{q}} \overline{{\mathrm{q}}} {\mathrm{q}} \overline{{\mathrm{q}}} {\mathrm{q}} \overline{{\mathrm{q}}} \) search.

Fig. 7
figure 7

Observed (full rectangles) and expected (dashed line) combined upper limits at 95 % CL on the sum of the \({\mathrm{W}^{\prime }}\) and \({\mathrm{Z}}^\prime \) production cross sections, weighted by their respective branching fraction of \({\mathrm{W}^{\prime }}\rightarrow \mathrm {W}\mathrm{H} \) and \({\mathrm{Z}}^\prime \rightarrow \mathrm{Z}\mathrm{H} \). The cross section for the production of a \({\mathrm{W}^{\prime }}\) and \({\mathrm{Z}}^\prime \) in the HVT model B, multiplied by its branching fraction for the relevant process, is overlaid. The observed limits of the three analyses entering the combination in the final states, \(\ell \nu \mathrm{b} \overline{\mathrm{b}} \) (full circle), \({\mathrm{q}} \overline{{\mathrm{q}}} \tau ^{+}\tau ^{-}\) [24] (full triangle pointing up), and \({\mathrm{q}} \overline{{\mathrm{q}}} \mathrm{b} \overline{\mathrm{b}} /{\mathrm{q}} \overline{{\mathrm{q}}} {\mathrm{q}} \overline{{\mathrm{q}}} {\mathrm{q}} \overline{{\mathrm{q}}} \) [22] (full triangle pointing down), are overlaid

Fig. 8
figure 8

Exclusion regions in the plane of the HVT-model couplings (\(g_\mathrm {V} c_\mathrm{H} \), \(g^{2}c_\mathrm {F}/g_\mathrm {V} \)) for three resonance masses, 1, 1.5, and 2\(~\text {TeV}\), where g denotes the weak gauge coupling. The point B of the benchmark model used in the analysis is also shown. The boundaries of the regions outside these lines are excluded by this search are indicated by the solid and dashed lines (region outside these lines is excluded). The areas indicated by the solid shading correspond to regions where the resonance width is predicted to be more than 7 % of the resonance mass and the narrow-resonance assumption is not satisfied

In Fig. 8, a scan of the coupling parameters and the corresponding observed 95 % CL exclusion contours in the HVT model from the combination of the analyses are shown. The parameters are defined as \(g_\mathrm {V} c_\mathrm{H} \) and \(g^{2}c_{F}/g_\mathrm {V} \), related to the coupling strengths of the new resonance to the Higgs boson and to fermions. The range of the scan is limited by the assumption that the new resonance is narrow. A contour is overlaid, representing the region where the theoretical width is larger than the experimental resolution of the searches, and hence where the narrow-resonance assumption is not satisfied. This contour is defined by a predicted resonance width of 7 %, corresponding to the largest resonance mass resolution of the considered searches.

9 Summary

A search has been presented for new resonances decaying into WH, in which the W boson decays into \(\ell \nu \) with \(\ell = \mathrm {e}\), \(\mu \) and the Higgs boson decays to a pair of bottom quarks. Each event is reconstructed as a leptonic W boson candidate recoiling against a jet with mass compatible with the Higgs boson mass. A specialized b tagging method for Lorentz-boosted Higgs bosons is used to further reduce the background from multijet processes. No excess of events above the standard model prediction is observed in the muon channel, while an excess with a local significance of 2.9 standard deviations is observed in the electron channel near \(\text{ M }_{\mathrm {W} \mathrm{H} } \approx 1.8~\text {TeV} \). The results are statistically compatible with the standard model within 2 standard deviations. In the context of the little Higgs and the heavy vector triplet models, upper limits at 95 % confidence level are set on the \({\mathrm{W}^{\prime }}\) production cross section in a range from 100 to 10\(\text {\,fb}\) for masses between 0.8 and 2.5\(~\text {TeV}\), respectively. Within the little Higgs model, a lower limit on the \({\mathrm{W}^{\prime }}\) mass of 1.4\(~\text {TeV}\) has been set. A heavy vector triplet model that mimics the properties of composite Higgs models has been excluded up to a \({\mathrm{W}^{\prime }}\) mass of 1.5\(~\text {TeV}\). In this latter context, the results have been combined with related searches, improving the lower limit up to \(\approx \)1.8\(~\text {TeV}\). This combined limit is the most restrictive to date for \({\mathrm{W}^{\prime }}\) decays to a pair of standard model bosons.