During the Italian research assessment exercise, the national agency ANVUR performed an experiment to assess agreement between grades obtained through informed peer review (IR) and bibliometrics. A sample was evaluated by using both methods and concordance was analyzed by weighted Cohen's kappas. According to ANVUR results indicated an overall "more than adequate" agreement which "fully justifies" the choice of using jointly both techniques in the assessment. However, according to available statistical guidelines for kappa values, the degree of agreement has to be interpreted, for all research fields, as poor or, in a few cases, as, at most, fair. The only notable exception is Area 13 (economics and statistics) and its sub-areas, showing moderate agreement. However, a statistical meta-analysis rejects the hypothesis that kappas from Area 13 share the same distribution as those from the other areas. In fact, a scrutiny of the experiment protocol adopted by the Area 13 panel highlights substantial modifications with respect to protocols of all the other areas, to the point that results for Area 13 have to be considered as fatally flawed. The evidence of a poor to fair concordance supports the conclusion that IR and bibliometrics do not produce similar results. As a consequence, final results reached in the Italian research assessment possibly depend on the mix of instruments used for evaluating research outputs. The conclusion reached by ANVUR must be reversed: the available evidence does not justify at all the joint use of both techniques within the same research assessment exercise.
Baccini, A., Giuseppe, D.N. (2015). Do they agree? Bibliometric evaluation vs informed peer review in the Italian research assessment exercise.
Do they agree? Bibliometric evaluation vs informed peer review in the Italian research assessment exercise
BACCINI, ALBERTO;
2015-01-01
Abstract
During the Italian research assessment exercise, the national agency ANVUR performed an experiment to assess agreement between grades obtained through informed peer review (IR) and bibliometrics. A sample was evaluated by using both methods and concordance was analyzed by weighted Cohen's kappas. According to ANVUR results indicated an overall "more than adequate" agreement which "fully justifies" the choice of using jointly both techniques in the assessment. However, according to available statistical guidelines for kappa values, the degree of agreement has to be interpreted, for all research fields, as poor or, in a few cases, as, at most, fair. The only notable exception is Area 13 (economics and statistics) and its sub-areas, showing moderate agreement. However, a statistical meta-analysis rejects the hypothesis that kappas from Area 13 share the same distribution as those from the other areas. In fact, a scrutiny of the experiment protocol adopted by the Area 13 panel highlights substantial modifications with respect to protocols of all the other areas, to the point that results for Area 13 have to be considered as fatally flawed. The evidence of a poor to fair concordance supports the conclusion that IR and bibliometrics do not produce similar results. As a consequence, final results reached in the Italian research assessment possibly depend on the mix of instruments used for evaluating research outputs. The conclusion reached by ANVUR must be reversed: the available evidence does not justify at all the joint use of both techniques within the same research assessment exercise.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/983838
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo