Skip to main content
Log in

Citation graph, weighted impact factors and performance indices

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

A scheme of evaluating an impact of a given scientific paper based on importance of papers quoting it is investigated. Introducing a weight of a given citation, dependent on the previous scientific achievements of the author of the citing paper, we define the weighting factor of a given scientist. Technically the weighting factors are defined by the components of the normalized leading eigenvector of the matrix describing the citation graph. The weighting factor of a given scientist, reflecting the scientific output of other researchers quoting his work, allows us to define weighted number of citation of a given paper, weighted impact factor of a journal and weighted Hirsch index of an individual scientist or of an entire scientific institution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adler, R., Ewing, J., & Taylor, P. (2008). Citation statistics. a report IMU-ICIAM-IMS. New York: Wiley.

  • Aksens, D. W. (2003). A marco-study of self–citations. Scientometrics, 56, 235–246.

    Article  Google Scholar 

  • Aksens, D. W. (2008). When different persons have an identical author name. How frequent are homonyms? JASIST, 59, 838–841.

    Article  Google Scholar 

  • Aksnes, D. W., & Rip, A. (2009). Researchers’ perceptions of citations. Research Policy, 38, 895–905.

    Article  Google Scholar 

  • Althouse, B. M., West, J. D., Bergstrom, T. C., & Bergstrom, C. T. (2009). Differences in impact factor across fields and over time. Journal of the American Society for Information Science and Technology, 60, 27–34.

    Article  Google Scholar 

  • An, Y., Janssen, J., & Milios, E. E. (2004). Characterizing and mining the citation graph of the computer science literature. Knowledge & Information Systems, 6, 664–678.

    Article  Google Scholar 

  • Banks, M. G. (2006). An extension of the Hirsch index: Indexing scientific topics and compounds. Scientometrics, 69, 161–168.

    Article  Google Scholar 

  • Batista, P. D., Campiteli, M. G., Kinouchi, O., & Martinez, A. S. (2006). Is it possible to compare researchers with different scientific interest? Scientometrics, 68, 179–189.

    Article  Google Scholar 

  • Bergstrom, C. (2007). Eigenfactor: Measuring the value and the prestige of scholarly journals. C&RL News, 68(5).

  • Bernstein, D. S. (2005). Matrix mathematics. Princeton: Princeton University Press.

    MATH  Google Scholar 

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks ISDN Systems, 30, 107–117.

    Article  Google Scholar 

  • Bryan, K., & Leise, T. (2006). The $25,000,000,000 eigenvector: the linear algebra behind Google. SIAM Review, 48, 569–581.

    Article  MATH  MathSciNet  Google Scholar 

  • Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with Google Pare Rank algorithm. Journal of Informetrics, 1, 8–15.

    Article  Google Scholar 

  • Falafas, M. E., & Alexiou, V. G. (2008). The top-ten in journal impact factor manipulations. Arch. Immunology Theor Exp., 56, 223–226.

    Article  Google Scholar 

  • Fowler, J. H., & Aksnes, D. W. (2007). Does self-citation pay? Scientometrics, 72, 427–437.

    Article  Google Scholar 

  • Frandsen, T. F. (2007). Journal self-citations—analysing the JIF mechanism. Journal of Informetrics, 1, 47–58.

    Article  Google Scholar 

  • Garfield, E. (1979). Citation indexing. New York: Wiley.

    Google Scholar 

  • Garfield, E. (1994). The impact factor. Current Contents, 29.

  • Glänzel, W. (2006). On h-index. A mathematical approach to a new measure of publication activity and citation impact. Scientometrics, 67, 315–321.

    Article  Google Scholar 

  • Glänzel, W., Debackere, K., Thijs, B., & Schubert, A. (2006). A concise review on the role of author self-citations in information science, bibliometrics and science policy. Scientometrics, 67, 263–277.

    Article  Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. PNAS, 102, 16569–16572.

    Article  Google Scholar 

  • Hirsch, J. E. (2007). Does the h index have predictive power?. PNAS, 104, 19193–19198.

    Article  Google Scholar 

  • Laloë, F., & Mosseri, R. (2009). Not even right, not even wrong. Europhysics News, 40(5), 27–29

    Article  Google Scholar 

  • Langville, A. N., & Meyer, C. D. (2005). A survey of eigenvector methods for web information retrieval. SIAM Review, 47, 135–161.

    Article  MATH  MathSciNet  Google Scholar 

  • Lehmann, S., Lautrup, B. E., & Jackson, A. D. (2003). Citation networks in high energy physics. Physical Review E 68, 026113.

  • Ma, N., Guan, J., & Zhao, Y. (2008). Bringing Page Rank to the citation analysis. Information Processing & Management, 44, 800–810.

    Article  Google Scholar 

  • Marshall, A. W., & Olkin, I. (1979). The theory of memorization and its applications. New York: Academic Press.

    Google Scholar 

  • Molinari, J. F., & Molinari, A. (2008). A new methodology for rating scientific institutions. Scientometrics, 75, 163–174.

    Article  Google Scholar 

  • Newman, M. E. J. (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74, 036104.

    Google Scholar 

  • Nielsen, M. A. (2008, Dec). Lectures on the Google Technology Stack. http://michaelnielsen.org.

  • Plomp, R. (1994). The highly cited papers of professors as an indicator of a research group’s scientific performance. Scientometrics, 29, 377–393.

    Article  Google Scholar 

  • Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality in citation distribution: Towards an objective measure of scientific impact. Proceedings of the National Academy of Sciences of the United States of America, 105, 17268–1727.

    Article  Google Scholar 

  • Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009). Diffusion of scientific credits and the ranking of scientists. Physical Review E 80, 056103.

    Google Scholar 

  • Redner, S. (1998). How popular is your paper. An empirical study of the citation distribution. European Physical Journal B, 4, 131–134.

    Google Scholar 

  • Schreiber, M. (2007). A case study of the Hirsch index for 26 non-prominent physicists. Annalen der Physik, 16, 640–652.

    Article  Google Scholar 

  • Schubert, A. (2007). Successive h-indices. Scientometrics, 70, 183–200.

    Article  Google Scholar 

  • Seglen, P. O. (1997). Why the impact factor should not be used for evaluating research. BMJ, 324, 497–502.

    Google Scholar 

  • Woeginger, G. J. (2008). An axiomatic characterization of the Hirsch index. Mathematical Social Sciences , 56, 224–232.

    Article  MATH  MathSciNet  Google Scholar 

  • Życzkowski, K. (2009). How to get an ERC grant? Europhysics News, 40(5), 27–29.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karol Życzkowski.

Appendices

Appendix A: Some exemplary graph matrices and their leading eigenvectors

In this appendix we provide examples of some simple matrices and analyze properties of their leading eigenvector. Although a matrix of a small size N directly represents only a small citation graph which describes a small group of N scientists, it can be also applied to model a huge graph with a sub-graph structure: each vertex may represent a given field or subfield of science. Therefore studying even such oversimplified cases can be helpful in understanding the properties of the connectivity matrix of a citation graph and its leading eigenvector.

Let us start with the simplest case of N = 2,

$$ C_2=\left[ \begin{array}{cc} 0 & a\\ b & 0 \end{array} \right], \quad x=\left[ \begin{array}{c} \sqrt{a} \\ \sqrt{b} \end{array} \right]. $$
(17)

The leading eigenvalue reads \(\lambda=\sqrt{ab},\) and in this case the weights x i given be the corresponding eigenvector are proportional to the square root of the flow between the vertexes. Obviously this is not longer the case for larger graphs,

$$ C_3=\left[ \begin{array}{ccc} 0 & a & 0\\ 0 & 0 & b \\ c & 0 & 0 \end{array} \right] \quad \quad x=\left[ \begin{array}{c} a^{2/3}b^{1/3}\\ b^{2/3}c^{1/3}\\ a^{1/3}c^{2/3} \end{array} \right]. $$
(18)

The following numerical example shows that the weights given by the leading eigenvector grow slower than linearly with the average entry in each row,

$$ C_4 =\left[ \begin{array}{cccc} 0 & 1 & 1 & 1\\ 2 & 0 & 2 & 2\\ 3 & 3 & 0 & 3\\ 4 & 4 & 4 & 0 \end{array} \right], \quad \quad x \approx \left[ \begin{array}{c} 0.3223\\ 0.5738\\ 0.7755\\ 0.9409 \end{array} \right]. $$
(19)

Consider now some other numerical examples of size four

$$ C =\left[ \begin{array}{cccc} 0 & 4 & 0 & 0\\ 6 & 0 & 2 & 1\\ 0 & 0 & 0 & 0\\ 0 & 0 & 3 & 0 \end{array} \right], \quad \quad x \approx \left[ \begin{array}{c} 0.6325 \\ 0.7746 \\ 0 \\ 0 \end{array} \right]. $$
(20)

Observe that quotations by authors, the papers of which were never cited do not contribute at all to the weighting index!

$$ C = \left[ \begin{array}{cccc} 0 & 6 & 6 & 6\\ 2 & 0 & 1 & 1\\ 0 & 1 & 0 & 1\\ 1 & 0 & 1 & 0\\ \end{array} \right], \quad \quad x \approx \left[ \begin{array}{c} 0.8975\\ 0.4228\\ 0.1247\\ 0.2036 \end{array} \right]. $$
(21)

Similarly, quotation by a junior scientist, the papers of which received a little attention of the scientific community, are much less important than a citation by an accomplished author. This is seen by comparing the third and the fourth component of the eigenvector of the above citation matrix, in which the first two rows represent a renowed researcher and a less experienced author, respectively.

It is illustrative to analyze the case of two weakly connected subgraphs, represented below by the first and the second pair of nodes. If the coupling between the subgraphs is symmetric, C 2,4 = C 4,2 the leading eigenvector lives in both subspaces,

$$ C = \left[ \begin{array}{cccc} 0 & 4 & 0 & 0\\ 6 & 0 & 0 & 1\\ 0 & 0 & 0 & 4\\ 0 & 1 & 6 & 0 \end{array} \right], \quad \quad x \approx \left[ \begin{array}{c} 0.4197\\ 0.5691\\ 0.4197\\ 0.5691 \end{array} \right], $$
(22)

However, if there is more fluxes between both subgraphs start to differentiate, the weight of the leading vector moves toward the distinguished subsystem,

$$ C = \left[ \begin{array}{cccc} 0 & 4 & 0 & 0\\ 6 & 0 & 0 & 1\\ 0 & 0 & 0 & 4\\ 0 & 0.5 & 6 & 0 \end{array} \right], \quad \quad x \approx \left[ \begin{array}{c} 0.4939\\ 0.6502\\ 0.3493\\ 0.4597 \end{array} \right], $$
(23)
$$ C =\left[ \begin{array}{cccc} 0 & 4 & 0 & 0\\ 6 & 0 & 0 & 1\\ 0 & 0 & 0 & 4\\ 0 & 0.1 & 6 & 0 \end{array} \right], \quad \quad x \approx \left[ \begin{array}{c} 0.5913\\ 0.7480\\ 0.1870\\ 0.2365 \end{array} \right]. $$
(24)

If two graphs are not connected, the leading eigenvalue is degenerated and one finds a corresponding eigenvector localized exclusively in the more populated subspace,

$$ C =\left[ \begin{array}{cccc} 0 & 4 & 0 & 0\\ 6 & 0 & 0 & 0\\ 0 & 0 & 0 & 1\\ 0 & 0 & 3 & 0 \end{array} \right], \quad \quad x \approx \left[ \begin{array}{c} 0.6325\\ 0.7746\\ 0\\ 0 \end{array} \right]. $$
(25)

To lift such a degeneracy one may modify the analyzed matrix C by forming its convex combination with the flat matrix S such that S ij  = 1/N. In this way one assures (Brin et al. 1998) that the leading eigenvector of C(p) = (1 − p)C + pS can be obtained by iterating sufficiently long the flat vector with all entries equal, w i  = 1/N, by the matrix C(p).

Appendix B: Practical remarks on evaluating the weighting vector

Selection of the data

The key issue by constructing the citation graph is an access to a reliable data base containing the scientific literature. For instance one may rely on the data provided by the ISI Web of Science, although some experts claim that it is biased toward the scientific journals published in English only and it does not cover uniformly the entire literature. Alternatively one may chose to use some publicly open web search engines, like Google Scholar. In this case it is believed that Google does not cover systematically earlier scientific literature. Furthermore it is not clear how to set simple criteria, which web documents should be taken into account. On one hand one might restrict the attention to the papers published by a scientific journal, which is first found in an earlier compiled list of all sources accepted. On the other hand, due to popularity of various web archives and preprint depositories (like arxiv.org) one might also accept formally unpublished preprints posted there. In such a case a special care has to be taken in order to avoid double counting the same article, first deposited in an archive, and later published in a journal, often under a slightly changed title.

Different fields of science

As illustrated with some simple matrix examples, if two fields of science are not coupled by any cross-citations, the leading vector describes only scientists working in the larger field. Similarly, if two fields of science are coupled only weakly by a few cross-citations, the leading eigenvector tends to be localized in the subgraph with more scientists, papers and citations, so the weighting factors handicap researchers working in a less popular subfield. The splitting of the entire graph into subgraphs can be defined in an objective way by applying the recent method of Newman (2006) to find community structure in the citation graph. Since it is well known that the citation patterns depend on the branch of science (Batista et al. 2006, Althouse et al. 2009), one should rather analyze two subgraphs separately, or renormalize the leading eigenvector separately for a given subfield. This is consistent with a rather general ’rule of thumb’: the bibliometric data should be normalized against the average computed for scientists working in the similar field of science in the corresponding window of time (Radicchi et al. 2008).

Degeneracy in names

It might not be easy to distinguish papers written by various scientist, who publish under the very same name (Aksens 2008). In principle one may try to distinguish them by the scientific discipline, the affiliations and the time window of their publishing activity, but is its unlikely to expect that the success rate will tend to unity. On the other hand, it is reasonable to conjecture that not distinguishing between the scientists with the same name will not impact much the weighting indices of all other researchers in the graph, as the weights of the links will be taken as the average.

Period of the scientific activity

It would be unwise to compare weighting indices of two researchers in very different age or living in different times. The number of universities, scientists, journals, papers and citations keeps growing fast. Hence one should expect that a comparison of two scientists with equally valuable accomplishments, whose scientific contributions are already forgotten (and their papers are not quoted any more), would reveal that the scientist active more recently is characterized by a larger weighting factor.

Uniqueness of the leading eigenvector of the citation matrix

A matrix C is called reducible if it can be transformed by a permutation P into matrix with a zero block below the diagonal, \(C^{\prime}=PCP^T= \left[ \begin{array}{cc} D_1 & Z \\ 0 & D_2 \end{array} \right],\) where D 1 and D 2 are square matrices. In the opposite case the matrix is called irreducible. The Frobenius-Perron theorem implies that for any irreducible non-negative matrix C its spectral gap is positive, γ: = z 1 − |z 2| > 0, so the real eigenvector \({\vec x}\) corresponding to the leading eigenvalue z 1 is unique. The size of the spectral gap governs the speed of the convergence of any initial vector iterated several times by C to the invariant state \({\vec x}=C{\vec x}.\)

The initial citation matrix C analyzed in this paper in principle could reducible, but due to numerous cross-citations between various researchers and subfields this possibility seems to be unlikely. Furthermore, the auxiliary (N + 1)-th node of the graph representing all scientists outside the ensemble under investigation introduces additional connectivity and hence increases (on average) the spectral gap.

The size of the spectral gap for the graph matrix describing entire scientific literature has to be determined in a numerical experiment. If the gap occurs to be too small to ensure convergence time realistic for practical implementations, one may always introduce a suitable modification of the citation matrix C. For instance, following the original idea of Page Rank (Brin et al. 1998), one could mix C with the flat matrix S such that S ij  = 1/N (see also Langville et al. 2005, Bryan et al. 2006).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Życzkowski, K. Citation graph, weighted impact factors and performance indices. Scientometrics 85, 301–315 (2010). https://doi.org/10.1007/s11192-010-0208-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-010-0208-6

Keywords

Navigation