welcome: please sign in

Revision 44 as of 2012-11-30 09:13:16

Clear message
location: ScholarHIndexCalculator

Scholar H-Index Calculator - Home page


About Scholar H-Index Calculator

Scholar H-Index Calculator (the Calculator from now on) is an addon for Firefox and Google Chrome which enhances Google Scholar results pages by showing a number of bibliometric data computed using the data appearing on video as input. Once installed, the Calculator works transparently when querying Google Scholar: as soon as you make a query, result pages are enriched with a number of useful data (e.g. the h-index computed on the basis of displayed data), and new functions are available.


The Team

Project Coordinator

Developers Team

Are you a research or industrial investor interested in financing the development of the Calculator? Do you need directions or have comments on the Calculator? You can contact us at shi_AT_mat.unical.it (replace _AT_ with a '@' to obtain our mail address).


Download


Documentation

Usage notes

Just point your browser to scholar.google.com and make a query! Once installed, the addon displays on top of Google Scholar result pages, the corresponding h-index, g-index, e-index and other measures of impact for the submitted query.

Disclaimer

The computed values are obtained from Google Scholar output (and from the Web) as it is, and might include self-citations, inaccuracies, informal and ghost citations. The author does not take any responsibility for the accuracy of indices values nor for the blind, non-manually-cleaned and inaccurate usage of this tool in official comparisons between authors and/or between journals.

You might be here because you are convinced that "high h-index = good scientist". Wrong. Have a look at this. If, on the other hand, you wrongly think that bibliometric analyses bring no information at all, read, e.g., this.

Advanced Interface

Since version 2.0 a new advanced interface mode can be enabled by clicking on the corresponding link located below the query textfield. Note that the addon advanced interface is not the same as the pre-existing Advanced Scholar Search link located at the right of the query textfield. When the Addon Advanced interface is enabled, a set of controls is visible per each paper. It is possible in turn:

  1. to select or deselect a single paper: deselected papers do not contribute in the computation of impact indices.

  2. to manually increase or decrease the number of self citations. Self citations are stripped from the total citation count of a given paper.

  3. to manually increase or decrease the number of authors for a given paper: this is useful when the number of authors reported by Google Scholar is not accurate and manual fixing is required (typical when the number of authors exceeds 4-5). Note: semi-automatic author fixing is possible since version 2.1 (see Section below).

  4. to load and save data: users can save their data analysis by simply saving the page at hand after completing their analysis.

Tips and description of impact indices

To have accurate results, set your Google Scholar preferences on the number of results to 100. In agreement with Google Scholar terms of service (which prohibits automatic querying), it is processed just the displayed result page. As of Version 3.0 (and the current 2.96 beta), the Calculator can parse papers beyond the 100th by clicking on the two links on the sign Do you want to add '''more''' or '''all''' data?. More will trigger the addition of the next result page on the current page (and indices will be re-computed accordingly). When clicking on all, all the result pages, up to the 1000th paper (this is an intrinsic Google Scholar constraint) will be added and indices computed accordingly. In both cases you will see the added papers on the bottom of the page. If you aim at computing your own indices values accurately, I strongly suggest to use both the addon Advanced interface and the Advanced Scholar Search form already provided by Google Scholar. In the Advanced Scholar Search page, fill the Return articles written by.. field with the name at hand in quotes (e.g. "Giovambattista Ianni"). Also, in this same form, you might want to restrict search to the supposed field of experience. It is then possible to use the new Addon Advanced Interface (it can be activated looking below the query textfield) for performing fine-grained analysis (toggling papers on and off, etc.)

delta-H and delta-G

These two values measure the minimum number of citations needed for incrementing the current h-index (g-index, respectively), by 1. In the case of delta-h the value is computed as (h+1)-c[h+1] + sum_(from 1 to h)[max((h+1) - c[i],0)], where c[h] is the number of citations for the paper in position h, and h is the current h-index. delta-G is computed as (g+1)^2 - sum(from 1 to g+1)c[i], for g the current g-index. Note that for increasing h-index by 1, one has to obtain delta-H new citations on those particular papers which fail to have h+1 citations: all the first h+1 papers must reach at least h+1 citations. Usually, the (h+1)-th paper is the main culprit for the value of delta-H. Gaining a citation on a paper already having h+1 citations does not help in decreasing delta-H; viceversa, for increasing g-index, any new citation on the first g+1 papers matters, no matter how it is distributed.

delta-H and delta-G should be a measure of how difficult would be for the author at hand to increase his/her h and g-index. Note that the range of delta-h and delta-g is relatively small (in the worst scenario, delta-h-max= 2h+1 and delta-g-max=2g+1).

Normalized Values

Normalized values (the second row of data which is displayed) are computed by normalizing the number of citations found per each paper. From version 2.3 on, there are two types of normalization.

Normalization per author: if paper i has been cited t times, and has been written by k authors, its number of normalized citations is t/k. All the indices values, in the row where data are normalized per author, are computed considering these normalized values. In particular the normalized h-index corresponds to h_{I,Norm} of Publish or Perish. Due to limitations on the Google Scholar output format, Scholar H-Index truncates to 4 (or 5) the author count for papers having more than 4 authors. In such a case, the presented h_{I,Norm} has to be taken has an upper bound estimate of its real value (take care, especially for fields like Biology and Chemistry in which 12+ authors is the usual number). For having a finer-grained value the Addon Advanced Interface allows to semi-automatically fix the number of authors to the accurate value (see above).

Normalization per age: if paper i has been cited t times, and has been written in 2001, its number of normalized citations per age is t/(CY-2001+1), for CY the current year. The above corresponds to the contemporary h-index of Sidiropoulos et al., with parameters delta=1 and gamma=1. Note that indices (h,g and e) computed on values normalized per age do not coincide with any of the Age Weighted metrics displayed by Publish or Perish, nor with hc-index (which is the contemporary h-index with delta=1 and gamma=4).

Although this metric can be subject of criticism (values will abruptly change each Jan 1st 0:00:01; citations on old papers become exponentially less influential year by year), we found it as having an intuitive simmetry and interpretation, compared with plain indices, and compared with indices normalized per co-authorship.

Why my h-index is not what I expect?

Computed indices values might differ from those of software tools like Publish or Perish mainly because Publish or Perish is hardwired to query http://scholar.google.com no matter which is your actual locale, while the H-Index Calculator works on the locale of your choice (scholar.google.it, scholar.google.co.uk etc.). Results from your local scholar.google.* might differ. Also note that queries submitted to Google Scholar such as author:"John Doe" return different (and tighter) results than author:John author:Doe, and also different from author:J author:Doe. Also you should take into consideration that selecting the field of expertise and narrowing your search with the filters available in the Scholar Advanced Interface will change indices values.

Author lists refinement

This function allows to (semi)-automatically compute accurate normalized indices, overcoming the underestimate of 4 authors in case of multi-authored papers with 4+ co-authors. If Scholar Preferences are set to display Bibtex data URLs, the advanced interface displays a new control named Refine this author list per each paper. Given paper P, acting on its corresponding Refine this author list button will fill the P entry with its full list of authors, and displays the full name of the journal/conference of P (if data is available). Indices are automatically updated accordingly.

It is also available a button named Refine all bibliographic entries, which will automatically perform the abovementioned refinement per each displayed paper. Be warned that refining all papers implies heavy traffic from your browser to the Google Scholar portal, and might make Scholar detect you as an automated software, subsequently asking for a captcha.

Version 3.0: complete (aggressive) author list refinement

Google Scholar displays only some authors for a given paper P, and this affects the Calculator estimates on the number of authors. Since version 2.1, the Calculator can complete the list of all authors for P: this feature is called Author Refinement and uses BibTeX data (provided by Scholar). It works only if Scholar Preferences are set to display BibTeX data for the selected paper. Note that BibTeX data may be incomplete: in this case, BibTex author lists are terminated with the string "others" meaning that the list is incomplete.

Since the 3.0 Release, the Calculator implements a new system that aggressively completes the list of the authors. We can explain generally how the system acts: it downloads the Web page corresponding to the paper (whose link is provided by Scholar), reasons inductively on it and extracts the remaining authors initially not present. To use the new aggressive refinement system, just click on the button labeled Refine This Author List near each record on Scholar, as in previous versions.

We are currently measuring the precision and recall of the system, with accurate experiments. We can preliminarily say that the system works with a 100% rate precision in most cases. However, aggressive refinement is strongly based on the Web page referenced. This means that if the page does not actually lists the authors for a paper, it is corrupted, it is in PDF format (not supported yet), or it is temporarily unreachable on the Internet, the aggressive refinement system, of course, will not be able to extract accurate data and will report an error message.

Version 3.0: Custom formula editing

As of Calculator 3.0, there is the possibility for users to add their own bibliometric formulas and display their outcome next to default indices. There are two types of custom formulas: Normalizations and Indices.

Normalizations

In the Calculator information box, each row shows bibliometric indices depending on a given Normalization. Each normalization weighs citations of each paper depending on a given criterion. Three are the default normalizations:

You can add your own normalization formulas by clicking on the button 'New normalization' on the bottom of the Information box. Two editable textfields will appear. Enter the normalization name in the leftmost field and your custom formulas in the rightmost. Click anywhere else when ready, and if your formula is correct, you should see a new row in which all the available indices are computed according to your new normalization notion. Enjoy!

Custom Normalization Formulas Language

You should be aware that normalization formulas are applied on per paper basis: your normalization formulas are intended to work in the context of a single paper. For a paper i a custom normalization formula f(i) returns a number of citations, depending on how f behaves. A normalization formula can access the following attributes of the paper i:

Allowed symbols:

==== Some further examples ====:

Custom formulas are visible only when the Advanced interface is enabled.

Indices

Indices correspond to columns in the Calculator information box. They correspond to a bibliometric index computed on the basis of a given set of papers. Besides the default indices you can add your own.

Custom index formulas Language

You should be aware that indices formulas are applied on the current sorted list of papers (usually the list of entries displayed on video, sorted by the normalization at hand). The current list of papers can be changed either by a) making a new query, or b) telling the Calculator that you want to add more data to the current set, by clicking over the appropriate links presented by the Calculator. For a sorted set of papers S a custom index formula f(S) returns an index value, depending on how f behaves. An index formula can access the attributes of all the papers of the corpus. Per each row in the information box, the corresponding normalization function is applied beforehand, and papers on video are preliminarily sorted according to their number of normalized citations: then, f is computed per each row, according to the corresponding normalization and the obtained sorting. The language available for custom index formulas is much richer than the normalization language. Constructs available are listed next.

In the following, assume a sorted list of papers S, and a normalization function n(i), for i denoting the i-th paper of S are given.

Special arrays:

x can be any allowed formula.

Special symbols:

Functions:

Aggregate functions are available: these come in two possible forms:

funcName(start,end,variable,expression)

or

funcName(start,end,variable,booleanExpression,expression)

Where start and end are expressions denoting respectively the numeric range which variable will sweep on; variable is an identifier of choice, which is allowed to appear in expression. A booleanExpression is in the form expr relOp expr where relOp can be one among <,>,>=,<=,==,!=.

Currently available aggregate functions are min, max, sum and prod. In order to exemplify how aggregates work, assume to have a set of 5 papers with respectively 10, 6, 4, 2 and 1 citations. Then

    min(1,N,i,citations[i]) = 1
    max(1,N,i,citations[i]) = 10
    sum(1,N,i,citations[i]) = 23

Boolean expressions can be used to select which papers should be filtered out in the aggregate function. For instance, the Google My Citations i10-index (the number of publications with at least 10 citations) is

   sum(1,N,i,citations[i] >= 10,1)

As in normalization formulas, allowed expression comprehend +, -, /, *, ^, (, ), with intuitive meaning.

Some examples

Indices used in the 2012 Italian "Abilitazione Scientifica Nazionale"

These can be programmed in the following way:

sum(1,N,i,citations[i])/max(1,N,i,year[i]>1900,age[i])

Here sum(1,N,i,citations[i]) is the total number of citations reported, while max(1,N,i,year[i]>1900,age[i]) estimates the academic age of an author as the age of the oldest paper appearing in the current set of papers.


Support for CiteseerX

Automatic index calculation when visiting CiteseerX has been discontinued since version 2.0.

Release Notes and history

May-Jun 2012. 3.0 Release with many new features:



Selected Publications