25060
Comment:
|
36527
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#acl GiovambattistaIanni:read,write,admin,delete,revert FrancescoCauteruccio:read,write GuestTester:read All:read = Scholar H-Index Calculator - Home page = <<TableOfContents>> |
#acl GiovambattistaIanni:read,write,admin,delete,revert FrancescoCauteruccio:read,write All:read = Scholar H-Index Calculator for Google Chrome - Official documentation = |
Line 9: | Line 8: |
== About Scholar H-Index Calculator == Scholar H-Index Calculator (the Calculator from now on) is an addon for Firefox and Google Chrome which enhances Google Scholar results pages by showing a number of bibliometric data computed using the data appearing on video as input. Once installed, the Calculator works transparently when querying [[http://scholar.google.com|Google Scholar]]: as soon as you make a query, result pages are enriched with a number of useful data (e.g. the h-index computed on the basis of displayed data), and new functions are available. |
{{{#!wiki caution '''How the Calculator works? I've installed it and I see no button!!''' == ANSWER: Point your browser to http://scholar.google.com and make a search. You will notice LOT of NEW STUFF. == '''You will be redirected to this very same page only on the first time you install the Calculator, or when an automatic upgrade has been installed. Apologies for this little bothering and have a look at the docs! (or maybe not).''' }}} == About == Scholar H-Index Calculator (the Calculator from now on) is a bibliometric and citation analysis tool which works as an addon for Google Chrome. == Features == * Computes most common bibliometric indices over Google Scholar pages on-the-fly * Can program your own fancy bibliometric formulas * Can clean automatically self and ghost citations away from computed indices * Can complete author lists when Google Scholar truncates to the first 4 authors * Can group papers based on possible authors' homonymies * Fast and immediate, no contact with other servers besides Google Scholar * No personal data, no activity of yours is logged <<TableOfContents(2)>> == How to use it == Starting using the Calculator is as easy as the following two steps: 1. [[#download|Download]] and install the Calculator using your web browser (Google Chrome is the only supported browser at the moment) 1. Point your browser to [[http://scholar.google.com|Google Scholar]] and make a query! Once installed, the addon displays on top of Google Scholar result pages, the corresponding h-index, g-index, e-index and other measures of impact for the submitted query. <<Anchor(download)>> == Download == * <<newicon>> [[https://chrome.google.com/webstore/detail/scholar-h-index-calculato/cdpobfbhbdlpbloccjokjgekjnmifbng|Version 4.2 for Google Chrome]]. * [[http://www.mat.unical.it/ianni/storage/scholar3.xpi|Version 3.1 for Firefox]]. ''Note: development for Firefox has been discontinued. New features will be available for Google Chrome only''. |
Line 14: | Line 48: |
---- | |
Line 18: | Line 51: |
* [[http://www.gibbi.com|Giovambattista Ianni]] (Code designing, writing, reviewing, maintainance and refactoring) Developers Team * [[http://www.francescocauteruccio.info/|Francesco Cauteruccio]] (Aggressive refining engine), Susanna Cozza (General code maintainance), Stefano Germano (Custom formulas parser), Maria Carmela Santoro (General code refactoring, Additional results browsing code). |
* [[https://plus.google.com/109508724956029192545|Giovambattista Ianni]] (Code design, programming, reviewing, maintainance and refactoring) Developers Team (in no particular order) * Mauro Ceraso (author clustering engine, new user interface), Massimo Canonaco (self and ghost citations cleaning), [[http://www.francescocauteruccio.info/|Francesco Cauteruccio]] (Aggressive refining engine), Susanna Cozza (General code maintainance), Stefano Germano (Custom formulas parser), Maria Carmela Santoro (General code refactoring, Additional results browsing code). |
Line 26: | Line 59: |
<<Anchor(download)>> ---- == Download == * [[http://www.mat.unical.it/ianni/storage/scholar3.xpi|3.0.1 Release Candidate (June 8th 2012)]] preview. * <<newicon>> Preliminary Chrome Extension ([[https://chrome.google.com/webstore/detail/scholar-h-index-calculato/cdpobfbhbdlpbloccjokjgekjnmifbng|Version 3.1 beta]]) * Official Scholar H-Index Calculator [[https://addons.mozilla.org/en-US/firefox/addon/scholar-h-index-calculator/|page]] at Mozilla. |
|
Line 36: | Line 62: |
---- | |
Line 38: | Line 63: |
=== Usage notes === Just point your browser to scholar.google.com and make a query! Once installed, the addon displays on top of Google Scholar result pages, the corresponding h-index, g-index, e-index and other measures of impact for the submitted query. |
|
Line 47: | Line 70: |
=== Advanced Interface === Since version 2.0 a new advanced interface mode can be enabled by clicking on the corresponding link located '''below''' the query textfield. Note that the addon advanced interface is not the same as the pre-existing ''Advanced Scholar Search'' link located at the '''right''' of the query textfield. When the Addon Advanced interface is enabled, a set of controls is visible per each paper. It is possible in turn: |
=== Tips === To have accurate results, set your Google Scholar preferences on the number of results to it maximum, which is 20 papers per page, as of 2017. Google Scholar terms of service prohibits automatic batch querying, thus the Calculator processes just the displayed result page. The Calculator can parse papers beyond the displayed one by adding 20 or 100 results by clicking on the "Want to add 20, 100, or all result?" question. When clicking on '''all''', all the result pages, up to the 1000th paper will be added and indices computed accordingly. You will see the added papers on the bottom of the page. If you aim at computing your own indices values accurately, I strongly suggest to use both the addon Advanced interface and the ''Advanced Scholar Search'' form already provided by Google Scholar. In the Advanced Scholar Search page, fill the ''Return articles written by..'' field with the name at hand in quotes (e.g. {{{"Giovambattista Ianni"}}}). Also, in this same form, you might want to restrict search to the supposed field of experience. It is then possible to use the new Addon Advanced Interface (it can be activated looking below the query textfield) for performing fine-grained analysis (toggling papers on and off, etc.) === What are delta-H and delta-G ? === These two values measure the minimum number of citations needed for incrementing the current h-index (g-index, respectively), by 1. In the case of delta-h the value is computed as {{{(h+1)-c[h+1] + sum_(from 1 to h)[max((h+1) - c[i],0)]}}}, where {{{c[h]}}} is the number of citations for the paper in position {{{h}}}, and {{{h}}} is the current h-index. delta-G is computed as {{{(g+1)^2 - sum(from 1 to g+1)c[i]}}}, for {{{g}}} the current g-index. Note that for increasing h-index by 1, one has to obtain delta-H new citations on those '''particular''' papers which fail to have h+1 citations: all the first h+1 papers must reach at least h+1 citations. Usually, the (h+1)-th paper is the main culprit for the value of delta-H. Gaining a citation on a paper already having h+1 citations does not help in decreasing delta-H; viceversa, for increasing g-index, new citations on the first g+1 papers matter, no matter how they are distributed. delta-H and delta-G should be a measure of how difficult would be for the author at hand to increase his/her h and g-index. Note that the range of delta-h and delta-g is relatively small (in the worst scenario, delta-h-max= 2h+1 and delta-g-max=2g+1). ==== Normalized Values ==== Normalized values (the rows of data starting from the second line in the calculator table) are computed by normalizing the number of citations found per each paper. There are two default types of normalization: '''Normalization per author''': if paper ''i'' has been cited ''t'' times, and has been written by ''k'' authors, its number of normalized citations is ''t/k''. All the indices values, in the row where data are normalized per author, are computed considering these normalized values. In particular the normalized h-index corresponds to h_{I,Norm} of Publish or Perish. Due to limitations on the Google Scholar output format, Scholar H-Index truncates to 4 (or 5) the author count for papers having more than 4 authors. In such a case, the presented h_{I,Norm} has to be taken has an upper bound estimate of its real value (take care, especially for fields like Biology and Chemistry in which 12+ authors is the usual number). For having a finer-grained value the Addon Advanced Interface allows to semi-automatically fix the number of authors to the accurate value (see above). '''Normalization per age''': if paper ''i'' has been cited ''t'' times, and has been written in 2001, its number of normalized citations per age is ''t/(CY-2001+1)'', for ''CY'' the current year. The above corresponds to the contemporary h-index of Sidiropoulos et al., with parameters delta=1 and gamma=1. Note that indices (h,g and e) computed on values normalized per age do not coincide with any of the Age Weighted metrics displayed by Publish or Perish, nor with hc-index (which is the contemporary h-index with delta=1 and gamma=4). Although this metric can be subject of criticism (values will abruptly change each Jan 1st 0:00:01; citations on old papers become exponentially less influential year by year), we found it as having an intuitive simmetry and interpretation, compared with plain indices, and compared with indices normalized per co-authorship. === How to use the Advanced Interface === The advanced interface mode can be enabled by clicking on the corresponding link located '''below''' the query textfield. Note that the addon advanced interface is not the same as the pre-existing ''Advanced Scholar Search'' link located at the '''right''' of the query textfield. When the Addon Advanced interface is enabled, a set of controls is visible per each paper. It is possible in turn: |
Line 51: | Line 110: |
1. '''to manually increase or decrease the number of self citations'''. Self citations are stripped from the total citation count of a given paper. | 1. '''to manually increase or decrease the number of self citations'''. Self citations are stripped from the total citation count of a given paper. The 'clean citations' button can compute self citations for your but it can be slow. |
Line 53: | Line 114: |
Line 55: | Line 117: |
=== Tips and description of impact indices === To have accurate results, set your Google Scholar preferences on the number of results to 100. In agreement with Google Scholar terms of service (which prohibits automatic querying), it is processed just the displayed result page. As of Version 3.0 (and the current 2.96 beta), the Calculator can parse papers beyond the 100th by clicking on the two links on the sign {{{Do you want to add '''more''' or '''all''' data?}}}. '''More''' will trigger the addition of the next result page on the current page (and indices will be re-computed accordingly). When clicking on '''all''', all the result pages, up to the 1000th paper (this is an intrinsic Google Scholar constraint) will be added and indices computed accordingly. In both cases you will see the added papers on the bottom of the page. If you aim at computing your own indices values accurately, I strongly suggest to use both the addon Advanced interface and the ''Advanced Scholar Search'' form already provided by Google Scholar. In the Advanced Scholar Search page, fill the ''Return articles written by..'' field with the name at hand in quotes (e.g. {{{"Giovambattista Ianni"}}}). Also, in this same form, you might want to restrict search to the supposed field of experience. It is then possible to use the new Addon Advanced Interface (it can be activated looking below the query textfield) for performing fine-grained analysis (toggling papers on and off, etc.) ==== delta-H and delta-G ==== These two values measure the minimum number of citations needed for incrementing the current h-index (g-index, respectively), by 1. In the case of delta-h the value is computed as (h+1)-c[h+1] + sum_(from 1 to h)[max((h+1) - c[i],0)], where c[h] is the number of citations for the paper in position h, and h is the current h-index. delta-G is computed as (g+1)^2 - sum(from 1 to g+1)c[i], for g the current g-index. Note that for increasing h-index by 1, one has to obtain delta-H new citations on those '''particular''' papers which fail to have h+1 citations: all the first h+1 papers must reach at least h+1 citations. Usually, the (h+1)-th paper is the main culprit for the value of delta-H. Gaining a citation on a paper already having h+1 citations does not help in decreasing delta-H; viceversa, for increasing g-index, any new citation on the first g+1 papers matters, no matter how it is distributed. delta-H and delta-G should be a measure of how difficult would be for the author at hand to increase his/her h and g-index. Note that the range of delta-h and delta-g is relatively small (in the worst scenario, delta-h-max= 2h+1 and delta-g-max=2g+1). ==== Normalized Values ==== Normalized values (the second row of data which is displayed) are computed by normalizing the number of citations found per each paper. From version 2.3 on, there are two types of normalization. '''Normalization per author''': if paper ''i'' has been cited ''t'' times, and has been written by ''k'' authors, its number of normalized citations is ''t/k''. All the indices values, in the row where data are normalized per author, are computed considering these normalized values. In particular the normalized h-index corresponds to h_{I,Norm} of Publish or Perish. Due to limitations on the Google Scholar output format, Scholar H-Index truncates to 4 (or 5) the author count for papers having more than 4 authors. In such a case, the presented h_{I,Norm} has to be taken has an upper bound estimate of its real value (take care, especially for fields like Biology and Chemistry in which 12+ authors is the usual number). For having a finer-grained value the Addon Advanced Interface allows to semi-automatically fix the number of authors to the accurate value (see above). '''Normalization per age''': if paper ''i'' has been cited ''t'' times, and has been written in 2001, its number of normalized citations per age is ''t/(CY-2001+1)'', for ''CY'' the current year. The above corresponds to the contemporary h-index of Sidiropoulos et al., with parameters delta=1 and gamma=1. Note that indices (h,g and e) computed on values normalized per age do not coincide with any of the Age Weighted metrics displayed by Publish or Perish, nor with hc-index (which is the contemporary h-index with delta=1 and gamma=4). Although this metric can be subject of criticism (values will abruptly change each Jan 1st 0:00:01; citations on old papers become exponentially less influential year by year), we found it as having an intuitive simmetry and interpretation, compared with plain indices, and compared with indices normalized per co-authorship. === Why my h-index is not what I expect? === Computed indices values might differ from those of software tools like Publish or Perish mainly because Publish or Perish is hardwired to query {{{http://scholar.google.com}}} no matter which is your actual locale, while the H-Index Calculator works on the locale of your choice (scholar.google.it, scholar.google.co.uk etc.). Results from your local {{{scholar.google.*}}} might differ. Also note that queries submitted to Google Scholar such as {{{author:"John Doe"}}} return different (and tighter) results than {{{author:John author:Doe}}}, and also different from {{{author:J author:Doe}}}. Also you should take into consideration that selecting the field of expertise and narrowing your search with the filters available in the Scholar Advanced Interface will change indices values. === Author lists refinement === This function allows to (semi)-automatically compute accurate normalized indices, overcoming the underestimate of 4 authors in case of multi-authored papers with 4+ co-authors. If Scholar Preferences are set to display Bibtex data URLs, the advanced interface displays a new control named ''Refine this author list'' per each paper. Given paper P, acting on its corresponding ''Refine this author list'' button will fill the P entry with its full list of authors, and displays the full name of the journal/conference of P (if data is available). Indices are automatically updated accordingly. It is also available a button named ''Refine all bibliographic entries'', which will automatically perform the abovementioned refinement per each displayed paper. ''Be warned that refining all papers implies heavy traffic from your browser to the Google Scholar portal, and might make Scholar detect you as an automated software, subsequently asking for a captcha.'' === Version 3.0: complete (aggressive) author list refinement === Google Scholar displays only some authors for a given paper P, and this affects the Calculator estimates on the number of authors. Since version 2.1, the Calculator can complete the list of all authors for P: this feature is called '''Author Refinement''' and uses BibTeX data (provided by Scholar). It works only if Scholar Preferences are set to display BibTeX data for the selected paper. Note that BibTeX data may be incomplete: in this case, BibTex author lists are terminated with the string ''"others"'' meaning that the list is incomplete. Since the 3.0 Release, the Calculator implements a new system that ''aggressively'' completes the list of the authors. We can explain generally how the system acts: it downloads the Web page corresponding to the paper (whose link is provided by Scholar), reasons inductively on it and extracts the remaining authors initially not present. To use the new aggressive refinement system, just click on the button labeled {{{Refine This Author List}}} near each record on Scholar, as in previous versions. We are currently measuring the precision and recall of the system, with accurate experiments. We can preliminarily say that the system works with a 100% rate precision in most cases. However, aggressive refinement ''is strongly based on the Web page referenced''. This means that if the page does not actually lists the authors for a paper, it is corrupted, it is in PDF format (not supported yet), or it is temporarily unreachable on the Internet, the aggressive refinement system, of course, will not be able to extract accurate data and will report an error message. === Version 3.0: Custom formula editing === As of Calculator 3.0, there is the possibility for users to add their own bibliometric formulas and display their outcome next to default indices. There are two types of custom formulas: [[#normalizations|Normalizations]] and [[#indices|Indices]]. |
<<Anchor(refinement)>> === Author lists completion === This function allows to (semi)-automatically compute accurate normalized indices, overcoming the underestimate of 4 authors in case of multi-authored papers with 4+ co-authors. The advanced interface displays a control named ''Refine this author list'' per each paper. Given paper P, acting on its corresponding ''Refine this author list'' button will fill the P entry with its full list of authors, and displays the full name of the journal/conference of P (if data is available). Normalized indices are automatically updated accordingly. Note that the "Refine author list" button appears only if Google Scholar Preferences are set to display Bibtex data URLs. It is also available a button named ''Refine all bibliographic entries'', which will automatically perform the abovementioned refinement per each displayed paper. ''Be warned that refining all papers implies heavy traffic from your browser to the Google Scholar portal, and might make Scholar detect you as an automated software, subsequently asking for a captcha.'' === How the author list completion works === The Calculator implements a system that ''aggressively'' completes the list of the authors. We can explain generally how the system acts. First, it looks at the paper Bibtex record. This can be however incomplete: in such a case the Calculator downloads the Web page corresponding to the paper (whose link is provided by Scholar), uses our Artificial Intelligence engine on it for finding the complete author list. The list completion system works with a good precision. However, note that the author list completion ''is strongly based on the Web page referenced''. This means that if the page does not actually lists the authors for a paper, it is corrupted, it is in PDF format (not supported yet), or it is temporarily unreachable on the Internet, the author list completion system will not be able to extract accurate data and will report an error message. === How to design custom bibliometric formulas and normalizations === Users can add their own bibliometric formulas and display their outcome next to default indices. There are two types of custom formulas: [[#normalizations|Normalizations]] and [[#indices|Indices]]. Custom formulas are visible only when the Advanced interface is enabled. |
Line 95: | Line 146: |
* 'none' : no normalization. The normalized citations of a paper correspond to those displayed (after subtracting self citations). Same as the custom formula {{{citations-selfCitations}}}. * 'by authors': the citations of each paper are normalized by the (estimated) number of authors. Same as the custom formula {{{(citations-selfCitations)/authors}}}. For instance a paper with 100 citations and 4 authors, will score a number of normalized citations of 25. The number of authors cannot be always estimated correctly unless the refinement function is used. You might want to read the [[#refinement|Author Refinement]] section about how the Calculator estimates the number of authors per each paper. * 'by age': if paper {{{i}}} has been cited {{{t}}} times, and has been written in {{{2001}}}, its number of normalized citations per age is {{{t/(CY-2001+1)}}}, for CY the current year. Same as the custom formula {{{(citations-selfCitations)/(thisYear-year+1)}}}. As an example, a paper scoring 100 citations and written in 2003, would score 10 normalized citations in 2012. You can add your own normalization formulas by clicking on the button 'New normalization' on the bottom of the Information box. Two editable textfields will appear. Enter the normalization name in the leftmost field and your custom formulas in the rightmost. Click anywhere else when ready, and if your formula is correct, you should see a new row in which all the available indices are computed according to your new normalization notion. Enjoy! |
* ''none'' : no normalization. The normalized citations of a paper correspond to those displayed (after subtracting self citations). Same as the custom formula {{{citations-selfCitations}}}. * ''by authors'': the citations of each paper are normalized by the (estimated) number of authors. This is the same as the custom formula {{{(citations-selfCitations)/authors}}}. For instance a paper with 100 citations and 4 authors, will score a number of normalized citations of 25. The number of authors cannot be always estimated correctly unless the refinement function is used. You might want to read the [[#refinement|Author Refinement]] section about how the Calculator estimates the number of authors per each paper. * ''by age'': if paper {{{i}}} has been cited {{{t}}} times, and has been written in {{{2001}}}, its number of normalized citations per age is {{{t/(CY-2001+1)}}}, for CY the current year. Same as the custom formula {{{(citations-selfCitations)/(thisYear-year+1)}}}. As an example, a paper scoring 100 citations and written in 2003, would score 10 normalized citations in 2012. You can add your own normalization formulas by clicking on the button 'New normalization' on the bottom of the Information box. Two editable textfields will appear. Enter the normalization name in the leftmost field and your custom formulas in the rightmost. ''Click anywhere else'' when ready, and if your formula is correct, you should see a new row in which all the available indices are computed according to your new normalization notion. Enjoy! |
Line 104: | Line 157: |
Line 107: | Line 161: |
Line 108: | Line 163: |
Line 109: | Line 165: |
Line 110: | Line 167: |
Line 111: | Line 169: |
* {{{age}}} : a shortcut for {{{(thisYear-year+1)}}}. | * {{{age}}} : a shortcut for {{{(thisYear-year+1)}}} (this year's paper are assumed to have age 1 as in most bibliometric literature). |
Line 116: | Line 176: |
* {{{+}}}, {{{-}}}, {{{/}}}, {{{*}}}, {{{^}}}, {{{(}}}, {{{)}}}, with intuitive meaning ({{{^}}} is exponentiation). The square root of {{{x}}} can be easily obtained as {{{x^0.5}}}. ==== Some further examples ====: |
* {{{+}}}, {{{-}}}, {{{/}}}, {{{*}}}, {{{^}}}, {{{(}}}, {{{)}}}, with intuitive meaning ({{{^}}} is exponentiation). The square root of {{{x}}} can be obtained as {{{x^0.5}}}. ==== Some further examples ==== |
Line 124: | Line 184: |
Custom formulas are visible only when the Advanced interface is enabled. <<Anchor(indices)>> | <<Anchor(indices)>> |
Line 130: | Line 190: |
You should be aware that indices formulas are applied on the current sorted list of papers (usually the list of entries displayed on video, sorted by the normalization at hand). The current list of papers can be changed either by ''a)'' making a new query, or ''b)'' telling the Calculator that you want to add more data to the current set, by clicking over the appropriate links presented by the Calculator. | Differently from normalization formulas, indices formulas are applied on the current sorted list of papers (usually the list of entries displayed on video, sorted by the number of normalized citations at hand). The current list of papers can be changed either by ''a)'' making a new query, or ''b)'' telling the Calculator that you want to add more data to the current set, by clicking over the appropriate links presented by the Calculator. |
Line 138: | Line 198: |
Line 139: | Line 200: |
Line 140: | Line 202: |
Line 141: | Line 204: |
Line 142: | Line 206: |
Line 149: | Line 214: |
Line 150: | Line 216: |
Line 164: | Line 231: |
Where {{{start}}} and {{{end}}} are expressions denoting respectively the numeric range which {{{variable}}} will sweep on; {{{variable}}} is an identifier of choice, which is allowed to appear in {{{expression}}}. A {{{booleanExpression}}} is in the form {{{expr relOp expr}}} where {{{relOp}}} can be one among {{{<}}},{{{>}}},{{{>=}}},{{{<=}}},{{{==}}},{{{!=}}}. | Where {{{start}}} and {{{end}}} are expressions denoting respectively the numeric range which {{{variable}}} will sweep on; {{{variable}}} is an identifier of choice, which is allowed to appear in {{{expression}}}. A {{{booleanExpression}}} is in the form {{{expr relOp expr}}} where {{{relOp}}} can be one among {{{<}}} (less than), {{{>}}} (greater than), {{{>=}}} (greater or equal than), {{{<=}}} (lesser than or equal),{{{==}}} or {{{=}}} (equal to), {{{!=}}} or {{{<>}}} (different than). |
Line 178: | Line 246: |
As in normalization formulas, allowed expression comprehend {{{+}}}, {{{-}}}, {{{/}}}, {{{*}}}, {{{^}}}, {{{(}}}, {{{)}}}, with intuitive meaning. | As in normalization formulas, allowed algebraic expressions include {{{+}}}, {{{-}}}, {{{/}}}, {{{*}}}, {{{^}}}, {{{(}}}, {{{)}}}, with intuitive meaning. |
Line 183: | Line 252: |
Line 184: | Line 254: |
Line 185: | Line 256: |
Line 186: | Line 258: |
Line 187: | Line 260: |
Line 188: | Line 262: |
Line 189: | Line 264: |
* Citations per year since first publications {{{sum(1,N,i,citations[i])/max(1,N,i,year[i]>1900,age[i])}}} | * Citations per year since first publications {{{sum(1,N,i,citations[i])/max(1,N,i,year[i]>1900,age[i])}}} (1900 is arbitrarily chosen as filter year). |
Line 199: | Line 275: |
Here {{{sum(1,N,i,citations[i])}}} is the total number of citations reported, while {{{max(1,N,i,year[i]>1900,age[i])}}} estimates the academic age of an author as the age of the oldest paper appearing in the current set of papers. | Here {{{sum(1,N,i,citations[i])}}} is the total number of citations reported, while {{{max(1,N,i,year[i]>1900,age[i])}}} estimates the academic age of an author as the age of the oldest paper appearing in the current set of papers. A paper is selected only if its year is greater than {{{1900}}} in order to exclude papers with unknown date from the computation. |
Line 201: | Line 279: |
Line 203: | Line 282: |
<<Anchor(clustering)>> === How co-authors clustering works === The Calculator automatically groups papers based on their likelihood of being written by the same author. This helps in reconstructing author careers when homonymies come into play. You should see in the left pane a number of ''co-authors clusters'', each of which associates a group of papers with a group of co-authors. Papers corresponding to a given cluster can be conveniently selected and de-selected by using the "Select/Deselect" Papers button. Your selection can be reset using the button "Reset selections". '''How it works''': at the moment, clusters are built by looking up author names which are in common between papers, and by looking at common keywords between paper titles. For instance, if paper {{{P0}}} has been co-authored by authors {{{A,B,C}}} and paper {{{P1}}} by authors {{{D,C,B,K}}}, {{{P0}}} and {{{P1}}} will be grouped in the same cluster since they share {{{B}}} and {{{C}}} as common authors (in order to be clustered together, two papers must share at least two authors). Papers are also clustered when they share at least two keywords in common, like in e.g. "'' '''Answer Set Programming''': a Primer''" and "''A uniform integration of higher-order reasoning and external evaluations in '''answer set programming''' ''". We are working at improving our clustering algorithms, thus feedback is welcome! === Automatic Self-citation cleaning === It is possible to clean each individual paper citation count from what are recognized as self-citations. A paper S is considered a self-citation of P, if P and S share at least one author. Author matching is done by comparing Google Scholar unique author IDs. Note that, for authors not having a personal account on Google Scholar, and thus not having a unique author ID, self-citations are detected by comparison of the author's textual names, thus the cleaning process can be in principle affected by homonymies. '''How to use it''': if the advanced interface is enabled, it is possible to clean the self-citations of a specific paper P by acting on the respective ''Clean citations'' button. This will sweep across the papers citing P and eliminate self-citations from the citation count of P. After the cleaning process is terminated, you should notice that the ''Invalid citations'' field in the P's record is incremented, while a label counting ''Valid'' citations appears close to the usual "Cited by" field. All the bibliometric indices are updated accordingly. The new feature is available on individual papers only and might result to be slow because of the necessary query [[#throttling|Throttling]]. ---- == Frequently asked questions == === Why my h-index is not what I expect? Why displayed results are different by day/browser/language? === Computed indices values might differ from those of software tools like Publish or Perish mainly because Publish or Perish is hardwired to query {{{http://scholar.google.com}}} no matter which is your actual locale, while the H-Index Calculator works on the locale of your choice (scholar.google.it, scholar.google.co.uk etc.), and results from your local {{{scholar.google.*}}} might differ. Also note that queries submitted to Google Scholar such as {{{author:"John Doe"}}} return different (and tighter) results than {{{author:John author:Doe}}}, and also different from {{{author:J author:Doe}}}. Moreover, you should take into consideration that selecting the field of expertise and narrowing your search with the filters available in the Scholar Advanced Interface will change indices values. It is recommended, when collecting bibliometric data, to record: * The day of the collection activity; * The national locale which has been used ({{{.com}}},{{{.it}}}, etc.); * The browser name and version; * Whether and which filters where in place (by year, by author, by scientific discipline, none at all); * The list of papers on video which the bibliometric indices where computed on. ''Technical'': for a sounder, uniform and comparable analysis, not a bad idea recording the whole content of the HTTP Requests sent, especially the {{{User-Agent}}}, {{{Cookie}}}, {{{Host}}} and {{{Accept-Language}}} fields. === Hello, I'm professor John Doe. Is there a chance I can tell my papers apart from all the other John Does in the academy? === The automated disambiguation of author homonymies is an hot research topic right now. You might want to give a try to the [[#clustering|Author Cluster]] feature of the Calculator. This helps in grouping papers in clusters of authors which co-authored some paper together, and is usually very helpful in telling your past papers apart from your homonyms. Some manual adjustment might be needed. === Can I use the Calculator on keywords other than author names? === Yes, you can in principle measure the impact of technologies, scientific names, acronyms, journals, conferences and the hype keyword of the moment by using adequate keywords on Google Scholar and checking their impact using the Calculator. Of course your research should be conducted with some care, since inaccuracies are behind the corner: a clear and unambiguous methodology should be set beforehand. E.g. it should be clearly defined how to treat synonimies like 'www' and 'world wide web', and where and when data is collected. Recall again that Scholar data changes from locale to locale, and it is is continuously evolving over time. <<Anchor(throttling)>> === "After about 50 searches (argh), I get the following 'friendly' message from Google: "We're sorry... ... but your computer or network may be sending automated queries" === === I get a captcha asking whether I am a robot! === '''This is not because of the presence of the extension in your browser'''. You must be aware that Google Scholar prevents massive querying on its servers. So, as soon as you perform massive activity, i.e. ''lot of queries in a very short time'', no matter whether the Calculator is installed or not, you will get the above message. In order to mitigate your temptations of spending the night looking figures of all your colleagues, the Calculator has an internal query rate throttling mechanism which reduces the possibility of massive user activity: there are however some things you can do to further mitigate the issue. These are: 1. '''Don't query quickly''', the Scholar portal is not conceived for massive automated data collection activities, but for normal, human, users; 1. '''Use wisely the function "add ALL the information"'''. This function collects data from Scholar's additional pages by issuing for you 1 additional query for each additional page. For instance if you ask for the author name {{{Carlo Rubbia}}}, you will get 4740 results (as of Jan 2013), only 10 or 20 of which displayed by default. The feature "add ALL information" is capable to collect results up to the 1000th item, by issuing for you all the additional queries to the Scholar portal. This of course increments your traffic rate to and from Google Scholar servers. Use the function "add X results" instead. This latter adds only a limited number of results. 1. '''Use wisely the feature "refine ALL bibliographic entries"'''. Although this feature is very powerful and can complete partial co-author lists and partial paper information, you must be aware that the feature issues 1 additional query per paper (i.e. 100 queries on a page with 100 results), thus augmenting traffic to and from Google Scholar servers. Use the feature "Refine this author list" instead, which works on a single paper using just 1 additional query. 1. '''Use wisely the clean citations button'''. Although the query throttling system will brake this process, still use this function wisely. === Can I quickly get back the access to Google Scholar once I'm blocked? === Usually it is sufficient to fill the CAPTCHA proposed by Google Scholar, or, when no captcha is shown, just wait a few minutes. As of 2016 (but one should assume that Google Bot Detectors will change their strategy over time), one can quickly regain access by either: -Opening a new anonymous session (CTRL+SHIFT+N) -Clearing your cookies cache (Google "Clear my cookies cache") -Changing your public IP address (Do these actions only if you understand their technical implications) === My search results in ">10" values, can I get a specific value instead of just ">10" ? === Yes, just add more results to the displayed page acting on the links appearing in the text which look like {{{Want to add *10*, *100* or all results ?}}}. We are sorry of not displaying 100 results by default, but this is due to the recent Google Scholar policy change (not the Calculator) of allowing maximum 10 or 20 results per query. === I'm getting a 'Error: document.getElementById(...) is null' error when running the addon === You're most probably running an obsolete version of the addon, most likely under Mozilla Firefox. You should switch to the [[#download|Chrome version]]. === How can I remove the Calculator? === You can temporarily disable the Calculator by acting on the 'disable' toggle which appears right on top of the Calculator information box. For permanent removal: * Mozilla Firefox: go on Tools -> Additional Components -> Extensions, then select and disable/remove the Calculator, using the apposite button. * Google Chrome: Menu -> Settings -> Extensions, then either use the 'Trash' icon for permanent removal, or the checkbox for temporarily disabling the Calculator. === Which permissions the Calculator asks for and why? === The Calculator asks for: * Access on all web sites: this is used for completing author lists by looking on the Web (digital libraries, etc.). '''No user data is sent to any external website'''. * Possibility to add items to context menus: used for letting the user query any text selection on any web site when right-clicking on some selected portion of text. * Opening tabs: needed for opening new Google Scholar windows. === How can I cite the Calculator on my scientific publication? === You can use the following: Ianni, G. et al. (2009) Scholar H-Index Calculator, available from {{{https://www.mat.unical.it/ianni/wiki/ScholarHIndexCalculator}}} ---- |
|
Line 204: | Line 397: |
---- === Support for CiteseerX === Automatic index calculation when visiting CiteseerX has been discontinued since version 2.0. |
|
Line 210: | Line 398: |
May-Jun 2012. 3.0 Release with many new features: * Possibility to add custom normalization and indices formulas see the [[#customformulas|Custom Formulas]] section * 'Refine author list' and 'Refine all bibliographic entries' functions now much more accurate (can correctly extract lists of thousands of authors in almost all cases) * Can now compute h-index values greater than 100 * Support for the new Scholar Modern look * Many bug fixes and internal code optimization |
* Oct 15th, 2017: Ver 4.2.5 Adapted to the newer Google Scholar layout. * Jul 20th, 2017: Ver 4.2.3 Fix for properly handling UTF-8 diacritics in author names. * Feb 15th, 2016: Ver 4.2.2 Added splash screen. * Jan 30th, 2015: Ver 4.2.1 Small fix for supporting the nowadays default https URL scheme. * Jan 20th, 2015: Ver 4.2. Introduced automatic self-citations cleaning. * November 2013: Ver 4.0. New user interface and paper clustering by co-authors groups (experimental) * July 20th, 2013: Ver 3.4. Added context menu: users can select text and comfortably right-click, then choose on the context menu for querying the selected text on Google Scholar. * July 5th, 2013: Ver 3.3. Added query rate throttling code. Prevents user from querying Google Scholar at an excessive pace (thus triggering bot blocking). * May 10th, 2013: Added show/hide toggle. Do not show bibliometric data on demand. *May-Jun 2012. 3.0 Release with many new features: * Possibility to add custom normalization and indices formulas see the [[#customformulas|Custom Formulas]] section * 'Refine author list' and 'Refine all bibliographic entries' functions now much more accurate (can correctly extract lists of thousands of authors in almost all cases) * Can now compute h-index values greater than 100 * Support for the new Scholar Modern look * Many bug fixes and internal code optimization |
Line 233: | Line 430: |
---- == Related Work == <<Anchor(publications)>> ---- == Selected Publications == |
== Publications == * Francesco Cauteruccio and Giovambattista Ianni. ''A domain meta-wrapper using seeds for intelligent author list extraction in the domain of scholarly articles''. [[http://www.tpdl2013.info/|TPDL 2013]]. LNCS 8092 , pp. 313--318 , 2013. * Francesco Cauteruccio and Giovambattista Ianni. ''A domain meta-wrapper using seeds for intelligent author list extraction in the domain of scholarly articles''. [[http://www.mat.unical.it/ianni/storage/HCalc-TR-2013-1-Long.pdf|Longer]] Technical Report. |
Scholar H-Index Calculator for Google Chrome - Official documentation
How the Calculator works? I've installed it and I see no button!!
ANSWER: Point your browser to http://scholar.google.com and make a search. You will notice LOT of NEW STUFF.
You will be redirected to this very same page only on the first time you install the Calculator, or when an automatic upgrade has been installed. Apologies for this little bothering and have a look at the docs! (or maybe not).
About
Scholar H-Index Calculator (the Calculator from now on) is a bibliometric and citation analysis tool which works as an addon for Google Chrome.
Features
- Computes most common bibliometric indices over Google Scholar pages on-the-fly
- Can program your own fancy bibliometric formulas
- Can clean automatically self and ghost citations away from computed indices
- Can complete author lists when Google Scholar truncates to the first 4 authors
- Can group papers based on possible authors' homonymies
- Fast and immediate, no contact with other servers besides Google Scholar
- No personal data, no activity of yours is logged
Contents
How to use it
Starting using the Calculator is as easy as the following two steps:
Download and install the Calculator using your web browser (Google Chrome is the only supported browser at the moment)
Point your browser to Google Scholar and make a query! Once installed, the addon displays on top of Google Scholar result pages, the corresponding h-index, g-index, e-index and other measures of impact for the submitted query.
Download
Version 3.1 for Firefox. Note: development for Firefox has been discontinued. New features will be available for Google Chrome only.
The Team
Project Coordinator
Giovambattista Ianni (Code design, programming, reviewing, maintainance and refactoring)
Developers Team (in no particular order)
Mauro Ceraso (author clustering engine, new user interface), Massimo Canonaco (self and ghost citations cleaning), Francesco Cauteruccio (Aggressive refining engine), Susanna Cozza (General code maintainance), Stefano Germano (Custom formulas parser), Maria Carmela Santoro (General code refactoring, Additional results browsing code).
Are you a research or industrial investor interested in financing the development of the Calculator? Do you need directions or have comments on the Calculator? You can contact us at shi_AT_mat.unical.it (replace _AT_ with a '@' to obtain our mail address).
Documentation
Disclaimer
The computed values are obtained from Google Scholar output (and from the Web) as it is, and might include self-citations, inaccuracies, informal and ghost citations. The author does not take any responsibility for the accuracy of indices values nor for the blind, non-manually-cleaned and inaccurate usage of this tool in official comparisons between authors and/or between journals.
You might be here because you are convinced that "high h-index = good scientist". Wrong. Have a look at this. If, on the other hand, you wrongly think that bibliometric analyses bring no information at all, read, e.g., this.
Tips
To have accurate results, set your Google Scholar preferences on the number of results to it maximum, which is 20 papers per page, as of 2017.
Google Scholar terms of service prohibits automatic batch querying, thus the Calculator processes just the displayed result page. The Calculator can parse papers beyond the displayed one by adding 20 or 100 results by clicking on the "Want to add 20, 100, or all result?" question.
When clicking on all, all the result pages, up to the 1000th paper will be added and indices computed accordingly. You will see the added papers on the bottom of the page. If you aim at computing your own indices values accurately, I strongly suggest to use both the addon Advanced interface and the Advanced Scholar Search form already provided by Google Scholar.
In the Advanced Scholar Search page, fill the Return articles written by.. field with the name at hand in quotes (e.g. "Giovambattista Ianni"). Also, in this same form, you might want to restrict search to the supposed field of experience. It is then possible to use the new Addon Advanced Interface (it can be activated looking below the query textfield) for performing fine-grained analysis (toggling papers on and off, etc.)
What are delta-H and delta-G ?
These two values measure the minimum number of citations needed for incrementing the current h-index (g-index, respectively), by 1.
In the case of delta-h the value is computed as (h+1)-c[h+1] + sum_(from 1 to h)[max((h+1) - c[i],0)], where c[h] is the number of citations for the paper in position h, and h is the current h-index.
delta-G is computed as (g+1)^2 - sum(from 1 to g+1)c[i], for g the current g-index.
Note that for increasing h-index by 1, one has to obtain delta-H new citations on those particular papers which fail to have h+1 citations: all the first h+1 papers must reach at least h+1 citations. Usually, the (h+1)-th paper is the main culprit for the value of delta-H. Gaining a citation on a paper already having h+1 citations does not help in decreasing delta-H; viceversa, for increasing g-index, new citations on the first g+1 papers matter, no matter how they are distributed.
delta-H and delta-G should be a measure of how difficult would be for the author at hand to increase his/her h and g-index. Note that the range of delta-h and delta-g is relatively small (in the worst scenario, delta-h-max= 2h+1 and delta-g-max=2g+1).
Normalized Values
Normalized values (the rows of data starting from the second line in the calculator table) are computed by normalizing the number of citations found per each paper. There are two default types of normalization:
Normalization per author: if paper i has been cited t times, and has been written by k authors, its number of normalized citations is t/k.
All the indices values, in the row where data are normalized per author, are computed considering these normalized values. In particular the normalized h-index corresponds to h_{I,Norm} of Publish or Perish. Due to limitations on the Google Scholar output format, Scholar H-Index truncates to 4 (or 5) the author count for papers having more than 4 authors. In such a case, the presented h_{I,Norm} has to be taken has an upper bound estimate of its real value (take care, especially for fields like Biology and Chemistry in which 12+ authors is the usual number). For having a finer-grained value the Addon Advanced Interface allows to semi-automatically fix the number of authors to the accurate value (see above).
Normalization per age: if paper i has been cited t times, and has been written in 2001, its number of normalized citations per age is t/(CY-2001+1), for CY the current year.
The above corresponds to the contemporary h-index of Sidiropoulos et al., with parameters delta=1 and gamma=1. Note that indices (h,g and e) computed on values normalized per age do not coincide with any of the Age Weighted metrics displayed by Publish or Perish, nor with hc-index (which is the contemporary h-index with delta=1 and gamma=4).
Although this metric can be subject of criticism (values will abruptly change each Jan 1st 0:00:01; citations on old papers become exponentially less influential year by year), we found it as having an intuitive simmetry and interpretation, compared with plain indices, and compared with indices normalized per co-authorship.
How to use the Advanced Interface
The advanced interface mode can be enabled by clicking on the corresponding link located below the query textfield. Note that the addon advanced interface is not the same as the pre-existing Advanced Scholar Search link located at the right of the query textfield. When the Addon Advanced interface is enabled, a set of controls is visible per each paper. It is possible in turn:
to select or deselect a single paper: deselected papers do not contribute in the computation of impact indices.
to manually increase or decrease the number of self citations. Self citations are stripped from the total citation count of a given paper. The 'clean citations' button can compute self citations for your but it can be slow.
to manually increase or decrease the number of authors for a given paper: this is useful when the number of authors reported by Google Scholar is not accurate and manual fixing is required (typical when the number of authors exceeds 4-5). Note: semi-automatic author fixing is possible since version 2.1 (see Section below).
to load and save data: users can save their data analysis by simply saving the page at hand after completing their analysis.
Author lists completion
This function allows to (semi)-automatically compute accurate normalized indices, overcoming the underestimate of 4 authors in case of multi-authored papers with 4+ co-authors. The advanced interface displays a control named Refine this author list per each paper. Given paper P, acting on its corresponding Refine this author list button will fill the P entry with its full list of authors, and displays the full name of the journal/conference of P (if data is available). Normalized indices are automatically updated accordingly.
Note that the "Refine author list" button appears only if Google Scholar Preferences are set to display Bibtex data URLs.
It is also available a button named Refine all bibliographic entries, which will automatically perform the abovementioned refinement per each displayed paper. Be warned that refining all papers implies heavy traffic from your browser to the Google Scholar portal, and might make Scholar detect you as an automated software, subsequently asking for a captcha.
How the author list completion works
The Calculator implements a system that aggressively completes the list of the authors. We can explain generally how the system acts.
First, it looks at the paper Bibtex record. This can be however incomplete: in such a case the Calculator downloads the Web page corresponding to the paper (whose link is provided by Scholar), uses our Artificial Intelligence engine on it for finding the complete author list.
The list completion system works with a good precision. However, note that the author list completion is strongly based on the Web page referenced. This means that if the page does not actually lists the authors for a paper, it is corrupted, it is in PDF format (not supported yet), or it is temporarily unreachable on the Internet, the author list completion system will not be able to extract accurate data and will report an error message.
How to design custom bibliometric formulas and normalizations
Users can add their own bibliometric formulas and display their outcome next to default indices. There are two types of custom formulas: Normalizations and Indices. Custom formulas are visible only when the Advanced interface is enabled.
Normalizations
In the Calculator information box, each row shows bibliometric indices depending on a given Normalization. Each normalization weighs citations of each paper depending on a given criterion. Three are the default normalizations:
none : no normalization. The normalized citations of a paper correspond to those displayed (after subtracting self citations). Same as the custom formula citations-selfCitations.
by authors: the citations of each paper are normalized by the (estimated) number of authors. This is the same as the custom formula (citations-selfCitations)/authors. For instance a paper with 100 citations and 4 authors, will score a number of normalized citations of 25. The number of authors cannot be always estimated correctly unless the refinement function is used. You might want to read the Author Refinement section about how the Calculator estimates the number of authors per each paper.
by age: if paper i has been cited t times, and has been written in 2001, its number of normalized citations per age is t/(CY-2001+1), for CY the current year. Same as the custom formula (citations-selfCitations)/(thisYear-year+1). As an example, a paper scoring 100 citations and written in 2003, would score 10 normalized citations in 2012.
You can add your own normalization formulas by clicking on the button 'New normalization' on the bottom of the Information box. Two editable textfields will appear. Enter the normalization name in the leftmost field and your custom formulas in the rightmost. Click anywhere else when ready, and if your formula is correct, you should see a new row in which all the available indices are computed according to your new normalization notion. Enjoy!
Custom Normalization Formulas Language
You should be aware that normalization formulas are applied on per paper basis: your normalization formulas are intended to work in the context of a single paper. For a paper i a custom normalization formula f(i) returns a number of citations, depending on how f behaves. A normalization formula can access the following attributes of the paper i:
citations : the number of citations for i (as this value appears on video).
year: the year of publication of i (as it appears on video. Conventionally set to '-100,000,000,000,000' if not present).
authors: the number of authors of i. This is estimated according on how the author list appears on video, and can be manually edited by clicking on the Authors field or acting on the 'Auth+' and 'Auth-' buttons for the paper 'i'. See the Author Refinement section.
selfCitations: the number of self citations of i, as they appear in the 'Self Citations' editable text field. Defaults to 0.
cleanCitations: a shortcut for (citations-selfCitations).
age : a shortcut for (thisYear-year+1) (this year's paper are assumed to have age 1 as in most bibliometric literature).
thisYear : current year, as of your PC's wall clock.
Allowed symbols:
+, -, /, *, ^, (, ), with intuitive meaning (^ is exponentiation). The square root of x can be obtained as x^0.5.
Some further examples
Carbone's normalization: citations/(authors^0.5).
hc-index(delta,gamma) : gamma*citations/age^delta (replace gamma and delta with your favourite values)
Combined age and author weighting: citations/age/authors
Indices
Indices correspond to columns in the Calculator information box. They correspond to a bibliometric index computed on the basis of a given set of papers. Besides the default indices you can add your own.
Custom index formulas Language
Differently from normalization formulas, indices formulas are applied on the current sorted list of papers (usually the list of entries displayed on video, sorted by the number of normalized citations at hand). The current list of papers can be changed either by a) making a new query, or b) telling the Calculator that you want to add more data to the current set, by clicking over the appropriate links presented by the Calculator. For a sorted set of papers S a custom index formula f(S) returns an index value, depending on how f behaves. An index formula can access the attributes of all the papers of the corpus. Per each row in the information box, the corresponding normalization function is applied beforehand, and papers on video are preliminarily sorted according to their number of normalized citations: then, f is computed per each row, according to the corresponding normalization and the obtained sorting. The language available for custom index formulas is much richer than the normalization language. Constructs available are listed next.
In the following, assume a sorted list of papers S, and a normalization function n(i), for i denoting the i-th paper of S are given.
Special arrays:
citations[x] : the number of normalized citations for the x-th paper in S (i.e. n(x)). Note that sorting of documents might be different per each normalization row: i.e. the formula citations[0] applied on the first row might refer to a different paper in the second row. Think, e.g., at a paper with 2000 citations which is the most cited for an author. If the paper has a similar number of authors (e.g. more than 1000, like here), it is very likely the same paper will not be the top-most in the 'normalization per author' row.
year[x]: the year of publication of the x-th paper in S.
age[x]: a shortcut for (thisYear-year[x]+1)
authors[x]: the number of authors of the x-th paper in S.
selfCitations[x]: the number of self citations of the x-th paper in S (not normalized).
plainCitations[x]: the number of citations for the x-th paper in S, without any normalization applied.
x can be any allowed formula.
Special symbols:
N : the number of papers in S.
h, g, e, deltaH, deltaG : the value of the respective indices, obtained according to the citation normalization at hand.
thisYear : current year, as of your PC's wall clock.
Functions:
Aggregate functions are available: these come in two possible forms:
funcName(start,end,variable,expression)
or
funcName(start,end,variable,booleanExpression,expression)
Where start and end are expressions denoting respectively the numeric range which variable will sweep on; variable is an identifier of choice, which is allowed to appear in expression. A booleanExpression is in the form expr relOp expr where relOp can be one among < (less than), > (greater than), >= (greater or equal than), <= (lesser than or equal),== or = (equal to), != or <> (different than).
Currently available aggregate functions are min, max, sum and prod. In order to exemplify how aggregates work, assume to have a set of 5 papers with respectively 10, 6, 4, 2 and 1 citations. Then
min(1,N,i,citations[i]) = 1 max(1,N,i,citations[i]) = 10 sum(1,N,i,citations[i]) = 23
Boolean expressions can be used to select which papers should be filtered out in the aggregate function. For instance, the Google My Citations i10-index (the number of publications with at least 10 citations) is
sum(1,N,i,citations[i] >= 10,1)
As in normalization formulas, allowed algebraic expressions include +, -, /, *, ^, (, ), with intuitive meaning.
Some examples
The h-index itself: max(1,N,x, sum(1,x,i,citations[i]>=x,1) )
The g-index: max(1,N,x,sum(1,x,i,citations[i])>=x*x,x)
Equivalent impact of the Top-10 articles: sum(1,10,i,citations[i])^0.5
e-index : sum(1,h,i,citations[i])-h^2
AR-index: sum(1,h,i,citations[i])^0.5 (corresponds to the AR index when citations are normalized by age).
Sum of citations in the last five years: sum(1,N,i,age[i] <= 5,citations[i])
H-Index points per year of scientific production: h/max(1,N,i,year[i]>1900,age[i]), where the condition year[i]>1900 avoids selecting papers of unknown date of publication. This is one of the criteria which was tentatively prescribed by the Italian Research Ministry for selecting Associate and Full professors.
Citations per year since first publications sum(1,N,i,citations[i])/max(1,N,i,year[i]>1900,age[i]) (1900 is arbitrarily chosen as filter year).
Indices used in the 2012 Italian "Abilitazione Scientifica Nazionale"
These can be programmed in the following way:
Total number of citations normalized by the academic age of the author: add as index formula
sum(1,N,i,citations[i])/max(1,N,i,year[i]>1900,age[i])
Here sum(1,N,i,citations[i]) is the total number of citations reported, while max(1,N,i,year[i]>1900,age[i]) estimates the academic age of an author as the age of the oldest paper appearing in the current set of papers. A paper is selected only if its year is greater than 1900 in order to exclude papers with unknown date from the computation.
The h-c index: add as normalization formula 4*citations/age. The column displaying the h-index will have a new row. The content of this new row will report the h-c index in the h-index column (as well as, in other rows, the h-index with several other normalizations).
Number of journal papers in the last 10 years of activity, possibly normalized if the academic age is less than 10: this cannot currently be obtained, due to technical constraints on how Scholar presents data about whether a paper is published in a journal or not.
How co-authors clustering works
The Calculator automatically groups papers based on their likelihood of being written by the same author. This helps in reconstructing author careers when homonymies come into play. You should see in the left pane a number of co-authors clusters, each of which associates a group of papers with a group of co-authors. Papers corresponding to a given cluster can be conveniently selected and de-selected by using the "Select/Deselect" Papers button. Your selection can be reset using the button "Reset selections".
How it works: at the moment, clusters are built by looking up author names which are in common between papers, and by looking at common keywords between paper titles. For instance, if paper P0 has been co-authored by authors A,B,C and paper P1 by authors D,C,B,K, P0 and P1 will be grouped in the same cluster since they share B and C as common authors (in order to be clustered together, two papers must share at least two authors). Papers are also clustered when they share at least two keywords in common, like in e.g. " Answer Set Programming: a Primer" and "A uniform integration of higher-order reasoning and external evaluations in answer set programming ".
We are working at improving our clustering algorithms, thus feedback is welcome!
Automatic Self-citation cleaning
It is possible to clean each individual paper citation count from what are recognized as self-citations.
A paper S is considered a self-citation of P, if P and S share at least one author. Author matching is done by comparing Google Scholar unique author IDs. Note that, for authors not having a personal account on Google Scholar, and thus not having a unique author ID, self-citations are detected by comparison of the author's textual names, thus the cleaning process can be in principle affected by homonymies.
How to use it: if the advanced interface is enabled, it is possible to clean the self-citations of a specific paper P by acting on the respective Clean citations button. This will sweep across the papers citing P and eliminate self-citations from the citation count of P. After the cleaning process is terminated, you should notice that the Invalid citations field in the P's record is incremented, while a label counting Valid citations appears close to the usual "Cited by" field. All the bibliometric indices are updated accordingly. The new feature is available on individual papers only and might result to be slow because of the necessary query Throttling.
Frequently asked questions
Why my h-index is not what I expect? Why displayed results are different by day/browser/language?
Computed indices values might differ from those of software tools like Publish or Perish mainly because Publish or Perish is hardwired to query http://scholar.google.com no matter which is your actual locale, while the H-Index Calculator works on the locale of your choice (scholar.google.it, scholar.google.co.uk etc.), and results from your local scholar.google.* might differ.
Also note that queries submitted to Google Scholar such as author:"John Doe" return different (and tighter) results than author:John author:Doe, and also different from author:J author:Doe. Moreover, you should take into consideration that selecting the field of expertise and narrowing your search with the filters available in the Scholar Advanced Interface will change indices values.
It is recommended, when collecting bibliometric data, to record:
- The day of the collection activity;
The national locale which has been used (.com,.it, etc.);
- The browser name and version;
- Whether and which filters where in place (by year, by author, by scientific discipline, none at all);
- The list of papers on video which the bibliometric indices where computed on.
Technical: for a sounder, uniform and comparable analysis, not a bad idea recording the whole content of the HTTP Requests sent, especially the User-Agent, Cookie, Host and Accept-Language fields.
Hello, I'm professor John Doe. Is there a chance I can tell my papers apart from all the other John Does in the academy?
The automated disambiguation of author homonymies is an hot research topic right now. You might want to give a try to the Author Cluster feature of the Calculator. This helps in grouping papers in clusters of authors which co-authored some paper together, and is usually very helpful in telling your past papers apart from your homonyms. Some manual adjustment might be needed.
Can I use the Calculator on keywords other than author names?
Yes, you can in principle measure the impact of technologies, scientific names, acronyms, journals, conferences and the hype keyword of the moment by using adequate keywords on Google Scholar and checking their impact using the Calculator.
Of course your research should be conducted with some care, since inaccuracies are behind the corner: a clear and unambiguous methodology should be set beforehand. E.g. it should be clearly defined how to treat synonimies like 'www' and 'world wide web', and where and when data is collected. Recall again that Scholar data changes from locale to locale, and it is is continuously evolving over time.
"After about 50 searches (argh), I get the following 'friendly' message from Google: "We're sorry... ... but your computer or network may be sending automated queries"
I get a captcha asking whether I am a robot!
This is not because of the presence of the extension in your browser. You must be aware that Google Scholar prevents massive querying on its servers. So, as soon as you perform massive activity, i.e. lot of queries in a very short time, no matter whether the Calculator is installed or not, you will get the above message.
In order to mitigate your temptations of spending the night looking figures of all your colleagues, the Calculator has an internal query rate throttling mechanism which reduces the possibility of massive user activity: there are however some things you can do to further mitigate the issue. These are:
Don't query quickly, the Scholar portal is not conceived for massive automated data collection activities, but for normal, human, users;
Use wisely the function "add ALL the information". This function collects data from Scholar's additional pages by issuing for you 1 additional query for each additional page. For instance if you ask for the author name Carlo Rubbia, you will get 4740 results (as of Jan 2013), only 10 or 20 of which displayed by default. The feature "add ALL information" is capable to collect results up to the 1000th item, by issuing for you all the additional queries to the Scholar portal. This of course increments your traffic rate to and from Google Scholar servers. Use the function "add X results" instead. This latter adds only a limited number of results.
Use wisely the feature "refine ALL bibliographic entries". Although this feature is very powerful and can complete partial co-author lists and partial paper information, you must be aware that the feature issues 1 additional query per paper (i.e. 100 queries on a page with 100 results), thus augmenting traffic to and from Google Scholar servers. Use the feature "Refine this author list" instead, which works on a single paper using just 1 additional query.
Use wisely the clean citations button. Although the query throttling system will brake this process, still use this function wisely.
Can I quickly get back the access to Google Scholar once I'm blocked?
Usually it is sufficient to fill the CAPTCHA proposed by Google Scholar, or, when no captcha is shown, just wait a few minutes.
As of 2016 (but one should assume that Google Bot Detectors will change their strategy over time), one can quickly regain access by either:
- -Opening a new anonymous session (CTRL+SHIFT+N) -Clearing your cookies cache (Google "Clear my cookies cache") -Changing your public IP address
(Do these actions only if you understand their technical implications)
My search results in ">10" values, can I get a specific value instead of just ">10" ?
Yes, just add more results to the displayed page acting on the links appearing in the text which look like Want to add *10*, *100* or all results ?. We are sorry of not displaying 100 results by default, but this is due to the recent Google Scholar policy change (not the Calculator) of allowing maximum 10 or 20 results per query.
I'm getting a 'Error: document.getElementById(...) is null' error when running the addon
You're most probably running an obsolete version of the addon, most likely under Mozilla Firefox. You should switch to the Chrome version.
How can I remove the Calculator?
You can temporarily disable the Calculator by acting on the 'disable' toggle which appears right on top of the Calculator information box. For permanent removal:
Mozilla Firefox: go on Tools -> Additional Components -> Extensions, then select and disable/remove the Calculator, using the apposite button.
Google Chrome: Menu -> Settings -> Extensions, then either use the 'Trash' icon for permanent removal, or the checkbox for temporarily disabling the Calculator.
Which permissions the Calculator asks for and why?
The Calculator asks for:
Access on all web sites: this is used for completing author lists by looking on the Web (digital libraries, etc.). No user data is sent to any external website.
- Possibility to add items to context menus: used for letting the user query any text selection on any web site when right-clicking on some selected portion of text.
- Opening tabs: needed for opening new Google Scholar windows.
How can I cite the Calculator on my scientific publication?
You can use the following: Ianni, G. et al. (2009) Scholar H-Index Calculator, available from https://www.mat.unical.it/ianni/wiki/ScholarHIndexCalculator
Release Notes and history
- Oct 15th, 2017: Ver 4.2.5 Adapted to the newer Google Scholar layout.
- Jul 20th, 2017: Ver 4.2.3 Fix for properly handling UTF-8 diacritics in author names.
- Feb 15th, 2016: Ver 4.2.2 Added splash screen.
- Jan 30th, 2015: Ver 4.2.1 Small fix for supporting the nowadays default https URL scheme.
- Jan 20th, 2015: Ver 4.2. Introduced automatic self-citations cleaning.
- November 2013: Ver 4.0. New user interface and paper clustering by co-authors groups (experimental)
- July 20th, 2013: Ver 3.4. Added context menu: users can select text and comfortably right-click, then choose on the context menu for querying the selected text on Google Scholar.
- July 5th, 2013: Ver 3.3. Added query rate throttling code. Prevents user from querying Google Scholar at an excessive pace (thus triggering bot blocking).
- May 10th, 2013: Added show/hide toggle. Do not show bibliometric data on demand.
- May-Jun 2012. 3.0 Release with many new features:
Possibility to add custom normalization and indices formulas see the Custom Formulas section
- 'Refine author list' and 'Refine all bibliographic entries' functions now much more accurate (can correctly extract lists of thousands of authors in almost all cases)
- Can now compute h-index values greater than 100
- Support for the new Scholar Modern look
- Many bug fixes and internal code optimization
- Jan 13th, 2012. 2.3.5 Adapted to Google Scholar page layout changes. Other minor bug fixing.
- May 31th, 2011. 2.3 Improved layout. Added indices normalized per age.
- Apr 26th, 2011. 2.2 Authors names are now clickable and point to the corresponding query on Scholar.
- Feb 24th, 2011. 2.1 New advanced refinement button per paper. Other minor improvements and bug fixing.
- Oct 23th, 2010. 2.0 Introduction of the advanced fine-tuning interface. Minor bug fixing and accuracy improvement in border cases.
- Nov 26th, 2009. 1.4 and 1.4.1 Added radio buttons for selecting the query type on the fly. Improved author parsing.
- Nov 21th, 2009. 1.3.3 Added on hover tooltips.
- Nov 19th, 2009. 1.3.2 Improved appearance. Added link to the same query with 100 results if h-index and g-index fail to compute properly.
- Nov 17th, 2009. 1.3 Introduced normalized indices and support for CiteSeerX (discontinued as of 2.0 version).
- Oct 26th, 2009. 1.2.3 Introduced checks for overflow of h,g,e,deltah,deltag values.
- Oct 21th, 2009. 1.2.1 A minor fix on delta-H computation.
- Oct 20th, 2009. 1.2. Introduced delta-H and delta-G. Minor fixes.
- Oct 16th, 2009. 1.1.1. Fix bug on e-index computation.
- Oct 11th, 2009. 1.1. Fix bug related to Google Scholar not strictly respecting citation descending order. Fix some citation values evaluated as NaN.
Publications
Francesco Cauteruccio and Giovambattista Ianni. A domain meta-wrapper using seeds for intelligent author list extraction in the domain of scholarly articles. TPDL 2013. LNCS 8092 , pp. 313--318 , 2013.
Francesco Cauteruccio and Giovambattista Ianni. A domain meta-wrapper using seeds for intelligent author list extraction in the domain of scholarly articles. Longer Technical Report.