2012 Global: UNICEF Global Evaluation Report Oversight System (GEROS) 2012: Quality Review of 2011 Evaluation Reports
Author: Joseph Barnes, Hatty Dinsmore, Sadie Watson, Annalize Struwig [IOD PARC]
UNICEF holds a long-standing commitment to independent assessment of the quality of evaluation reports produced by its country and regional offices all over the world, as well as HQ divisions. This is the third report to use the current methodology to assess the quality of evaluation reports against UNICEF standards1.
This quality review process covered all 2011 evaluation reports submitted to the UNICEF Global Evaluation Database by the cut-off date of April 2012. The standards against which evaluation reports are assessed are set by the UNICEF deployment of the United Nations Evaluation Group (UNEG) global evaluation report standards.
Specific objectives of the review are to: 1) review and rate (with justifications) the quality of the main elements of evaluation reports 2) provide constructive feedback to improve future evaluations; 3) provide a global analysis of key trends; and 4) provide actionable conclusions and recommendations to improve the evaluation function.
Reports were selected as responding to the definition of evaluation according to UNICEF standard criteria. This led to 86 full reviews. An expert familiar with the UNICEF evaluation function undertook each review after completing a dedicated induction process. Three levels of quality assurance were applied: basic completeness; sampled peer-reviewing; and a right-to-challenge option exercised by
the UNICEF Evaluation Office.
The full review tool is presented in the Annexes. This was originally co-designed by UNICEF and IOD PARC in 2010 and redesigned in 2011 based upon experience garnered from implementing the approach. Each of 58 questions, 6 sections and the overall report are given a rating of either: ‘very confident’, ‘confident’, ‘almost confident’, or ‘not confident’2. In addition to ratings, commentary is provided against each section and sub-section, suggestions for future improvement provided for each section, and executive feedback provided for each section and the overall report.
The review process generated an extensive dataset to inform the trend analysis, including 5,829 quantitative ratings and 3,654 sections of qualitative text. In order to distil the key findings from this data, a multi-stage process was adopted consistent with the previous two meta-evaluations.
The limitations of time on the level of data analysis were mitigated as far as possible through triangulation of quantitative and qualitative patterns in the data. This enables assessment only of the evaluation report and not the evaluation process itself. Furthermore, the approach is limited to being able to identify only the ‘headline’ findings, with the possibility that more nuanced or infrequently occurring issues exist for individual readers to find within the reviews themselves.
For the first time, evaluation reports resulting from evaluations submitted by the Evaluation Office were considered separately from evaluations submitted by other corporate departments and offices. Only one country-led evaluation was included in the sample frame.
Overall, the review found a year-on-year improvement in performance in terms of more reports being rated as meeting UNICEF Evaluation Standards (42% in 2011, 40% in 2010) and fewer reports identified as having fundamental problems (23% in 2011, 30% in 2010). At the top end of the scale, four reports were rated as very confident3 overall, with ten reports having at least one individual section rated as very confident.
Our qualitative analysis suggests that variability is the major challenge faced in delivering reports that meet UNICEF standards. Reports rated as Not Confident, reports tend to show misunderstanding of the evaluation process, lack evidence, or present insufficient analysis. As reports progressively rate more highly, variability tends to be evident in terms of the way that human rights, gender, and equity are addressed.
There is a large body of reports that are not fundamentally flawed in evaluative terms, but are insufficiently cognizant of UNICEF norms to meet the required standards. Thirty reports could have rated as ‘confident’ with just a little more work to align them with UNICEF norms.
More 2011 reports included Terms of Reference: 71% compared to 64% in 2010. Qualitative analysis suggests that around 50% had a positive overall effect on the final reports.
The majority of reports, 89%, relate to the national or sub-national level. Reports at this level also record the lowest proportions of ‘confident’ ratings. The data would appear to suggest that evaluative capacity is currently concentrated at the regional level.
Joint UN evaluations register the strongest overall performance with 83% meeting UNICEF standards in 2011. Programme evaluations registered the largest growth in terms of numbers and also rated the strongest performance, with 58% of reports meeting or exceeding UNICEF standards.
Over a three year period, the number of impact-level evaluations has declined whilst the number of outcome level evaluations has grown. Impact-level evaluations remain the strongest in terms of report quality, with 61% meeting UNICEF standards compared to 41% of outcome-level and 22% of output-level.
Multi-sector evaluations have doubled, from 12% in 2009 to 25% in 2011. The largest reduction was in HIV/AIDS evaluations, falling from 9% in 2009 to 4% in 2011. Cross-cutting evaluations also fell 5 percentage points, from 19% in 2009 to 14% in 2011. Both Young Child Survival and Development, and Basic Education and Gender Equality saw significant improvements in quality, both registering around ten percentage-points more reports rated ‘confident’. Child Protection and HIV/AIDS evaluations remain areas of concern.
There is a clear shift away from internal management of independent evaluators towards external management. There is also a trend towards more formative-stage evaluations.
The range between the lowest rated section and the highest rated section has steadily closed, from 20 points in 2009 to 12 points in 2011. Despite the continuing challenge of variability, reports are gradually becoming more consistent across sections. As with previous years, the weakest sections related to recommendations and lessons learned, and to the structure and style of reports.
There was a step-change in ratings relating human rights, gender and equity; from 18% meeting UNICEF standards in 2010 to 33% in 2011, and for stakeholder participation from 40% in 2010 to 52% in 2011. Ethics registers 59% of reports as Not Confident, but doubled the proportion of reports meeting UNICEF standards: from 10% in 2010 to 22% in 2011.
Five regions recorded increases in performance, and three regions recorded decreases. The biggest gains were in ROSA (+25%), EAPRO (+16%) and CEECIS (+15%). ESARO remains the highest performing region at 53% satisfactory, and reports led by the Evaluation Office remain the strongest overall at 80%. TACRO and WCARO recorded the most significant drop in performance, leading to the lowest overall ratings in terms of UNICEF standards.
- There is emerging a trajectory of improvement in report quality. The percentage of reports classified as Confident to Act and Very Confident to Act has grown 17% since the first meta-evaluation published in 2010.
- 57% of reports submitted to GEROS rating still fall short of UNICEF evaluation standards.
- Inconsistent quality appears to be the major challenge facing reports of all ratings, including variability between the language groups.
- There is a clear trend toward more independent and formative outcome-level reports.
- Three times as many impact evaluations than output evaluations rate as satisfactory, and two times as many regional than sub-national evaluations rate as satisfactory. Bigger is delivering better in terms of report quality.
- Clear language and structure is as important as quality content for achieving UNICEF standards.
- The argument for fewer evaluations made in previous meta-evaluations was wrong. Longitudinal data suggests that it is possible to successfully deliver increased quantity and quality.
UNICEF Evaluation Office
- Report quality is improving: focus on accelerating this through synthesising lessons learned and cross-fertilising these insights between regions
- Provide quick reference guidance on methodologies, limitations and ethics
- Investigate what institutional conditions are driving inconsistent quality within and across evaluation reports
Regional and Country Offices together
- Efforts to improve quality are showing signs of working in some places: find out what is working well and keep doing it
- Increase the proportion of reports rated as confident by going the last mile with structure, language and presentation
- Experiment with ways to reduce the number of small sub-national and output-level evaluations
1 The methodology only considers the quality of evaluation reports, not the quality of the evaluation process or its effectiveness.
2 The concept of “confidence” relates to UNICEF’s Evaluation Report Standards; in which a report that satisfactorily meets these standards is one where decision makers can use the findings, conclusions, recommendations, and lessons learned with confidence. Where relevant, a N/A option is also provided.
3 Evaluation of Phases I and II of UNICEF’s Programme “Adolescents: Agents of Positive Change”, MENA Region; Evaluation of the Safe and Caring Child-Friendly Schools (SCCFS) Programme 2007-2010, South Africa; Formative Evaluation of Improvement of Mother and Child Health Services, Uzbekistan; Formative Evaluation of the United Nations Girls' Education Initiative (UNGEI), Egypt, Nepal, Nigeria, Uganda.
Full report in PDF
PDF files require Acrobat Reader.