Evaluation database

Evaluation report

2017 Global: UNICEF GEROS Meta-Analysis 2016

Author: Principal Author: Joseph Barnes; Evaluation Manager: Ada Ocampo

Executive summary

UNICEF GEROS Meta-Analysis 2016: An independent review of UNICEF evaluation report quality and selected trends, 2009-2016


This review is a meta-analysis of the quality of the evaluation reports submitted to UNICEF’s Global Evaluation Reports Oversight System (GEROS) during 2016. It synthesizes results of 101 evaluation reports, reviewed for quality by an independent team according to UNICEF and UN-SWAP standards; and shares findings on a global level, as well as highlighting trends across regions, sectors trends and quality assessment criteria. This report contributes to a wider body of knowledge in addition to integrating requirements for reporting on the UN-SWAP evaluation performance indicator.


The purpose of the meta-analysis is to contribute to achieving the three objectives of GEROS (particularly objective 1) :

  • Objective 1: Enabling environment for senior managers and executive board to make informed decisions based on a clear understanding of the quality of evaluation evidence and usefulness of evaluation reports;
  • Objective 2: Feedback leads to stronger evaluation capacity of UNICEF and partners;
  • Objective 3: UNICEF and partners are more knowledgeable about what works, where and for who.


This meta-analysis was conducted in April 2017 once all of the evaluation reports had been assessed, submitted to UNICEF EO and accepted. Quantitative data was compiled regarding scores for different aspects of the reports using Excel. Analysis was carried out across multiple axes:

  • Regional trends (regional and country levels)
  • Trends by quality assessment criteria (including across time)
    Object of the evaluation; Evaluation purpose, objectives and scope; Evaluation methodology; Findings; Conclusions and lessons learned; Recommendations; Evaluation principles (gender, human rights and equity); Report structure, logic and clarity; Executive summary
  • Type of management
  • Purpose
  • Scope
  • Results level
  • Evaluation Taxonomy
  • Strategic Plan Objective Area correspondence
  • UN-SWAP performance and trends

The comments made by reviewers on each evaluation quality assessment were filtered according to section and overall ratings, and then synthesized to identify common themes and thus explore any causal links between recurrent issues and particular ratings. In addition the reviews were trawled to explore good practice from the reports. Quantitative and qualitative data were triangulated, and compared with longitudinal data on findings from four previous years to map key trends and patterns.


Overall, the number of reports meeting UNICEF standards is higher than ever before. The majority of reports (74%) fully met UNICEF evaluation report standards, which is an improvement on the ratings for 2015 (53%) and the same as the level achieved in 2014. Of these, 6% of reports were rated as highly satisfactory, which is consistent with previous years. Only one report was fully unsatisfactory. The remaining 25% of reports were rated as ‘fair’, meaning that they can be used with caution by taking account of their limitations.

The revised evaluation quality assurance tool for 2016 allows for additional levels of disaggregation beyond the five main classifications (highly satisfactory, satisfactory, fair, unsatisfactory, missing). Deeper analysis of ratings (see Figure 3) reveals that the majority of evaluation reports reaching UNICEF standards are in the lower-band of the satisfactory rating. This suggests that recent improvements to reports have been sufficient to improve the quality of evaluations rated as ‘almost satisfactory’ in 2015 (broadly equivalent to ‘fair’ in the 2016 system). Very few reports (only 9% in 2016) remain in the lower band of ‘fair’ or fully ‘unsatisfactory’.

Nine evaluation reports were rated at the very upper end of the ‘satisfactory’ range, with a mean score (x) of 3.3≤x<3.5 out of a maximum of 4 points (reports scoring 3.5 or over are rated as ‘highly satisfactory’). With improvements in one or two sections, these reports are likely to have been rated ‘highly satisfactory’; and include one report each from CEE/CIS, LACR and WCAR, and two reports each from EAPR, ROSA, and the Evaluation Office.

The strongest aspects of evaluation reports in 2016 were ‘Purpose, objectives and scope’ and ‘Recommendations’. ‘Executive summaries’ were also rated relatively strongly. These elements are all key contributors to utility of evaluations. By contrast, evaluation principles (HRBAP and gender equality), lessons learned and methods sections received the lowest ratings. These results indicate that evaluations are focusing on utility for primary intended users, with more attention needed to credibility and learning. The relative weakness of the methods and conclusions/lessons echoes the same pattern as 2015 evaluations.

Comparison of evaluation reports across the UNICEF regions reveals a 53% increase in the number of reports submitted by ESAR in 2016, which was already the largest region for evaluation in 2015. Whilst 14 of the ESAR reports were rated as ‘satisfactory’ in meeting UNICEF standards compared to 12 in 2015, a significant proportion (42%) in 2016 were rated as ‘fair’. In terms of the number of reports rated as ‘satisfactory’ and ‘highly satisfactory’, there were significant increases in LACR, WCAR and corporate evaluations, whilst all other regions maintained the absolute number of reports meeting UNICEF standards from 2015. ‘Highly satisfactory’ reports were spread across the regions, with a cluster of three in EAPR standing out as representing 25% of the evaluations from that region.

Quantitative analysis of alignment with Strategic Plan Objective Areas (including cross-cutting issues) found that most evaluations cover multiple thematic areas. Similar distributions of quality were found across all thematic areas, with a slightly higher proportion of evaluations fully meeting UNICEF standards in reports that addressed humanitarian action or gender equality as cross-cutting issues. The largest body of evaluative knowledge was generated for child protection and health, with education, gender equality and social inclusion also being covered by a large number of evaluations.

The aggregated average UN-SWAP score for integration of gender equality in 2016 was 6.2, which is classified as Approaching Requirements. This is the same as the 2015 cycle, but for a much larger portfolio of reports. Reports were slightly stronger regarding integrating gender in the scope, indicators, criteria and questions of evaluations. The priority for action to improve UN-SWAP remains to ensure gender analysis is used to inform evaluation findings, conclusions and recommendations.

Most evaluations used mixed methods, with 19% of evaluations being purely qualitative. 76% of mixed methods evaluations meet UNICEF standards compared to 68% of qualitative evaluations. All evaluations rated as ‘highly satisfactory’ were mixed methods. The type of evaluation was strongly associated with the quality of evaluation reports. The 2015 meta-analysis concluded that UNICEF is institutionally stronger at programme evaluation than other levels of evaluation; and a different set of evidence from 2016 indicates that this conclusion still holds.

Conclusions and Recommendations:

UNICEF evaluation reports in 2016 continued the longer-term trend in improving quality; and managed to maintain this advance across an expanded portfolio of 101 evaluations covering at least 66 countries
Recommendation 1: To address recurrent weaknesses, ensure that all evaluation reports include a clear explanation of the evaluation design, provide a specific sub-section on the integration of ethical guidance, and examination of unexpected effects

Evaluation quality data indicates a strong focus on utility in UNICEF evaluations – in terms of both accountability and learning for specific interventions – but developing generalized lessons learned remains a challenge
Recommendation 2: Strengthen guidance and capacity development of the decentralized evaluation system – including in HQ divisions – in developing lessons learned that add to common knowledge and are generalizable beyond the object of the evaluation

While the integration of human rights based approaches and gender equality commitments has generally improved over time, it has not kept pace with advances in other areas of quality and the expanded evaluation portfolio is still to meet the requirements of UN-SWAP
Recommendation 3: Apply the revised UNICEF ToR and evaluation report checklists and UN Evaluation Group guidance to ensure evaluations fully integrate HRBAP and UN-SWAP requirements at the inception stage; with particular focus on using gender as an analytical lens across all evaluation criteria and questions

Terms of reference and inception phases are demonstrating success in establishing clear purpose and scope for evaluations: this strength can be built upon by improving the theories of change and integration of results based management systems in evaluations
Recommendation 4: Encourage evaluations to include detailed examinations or reconstructions of the theories of change for evaluation objects; explaining the causal relationships, levels of change, assumptions, and risks that shaped the thinking of intervention designers

You will find further below the following labelled as:

  • UNICEF GEROS Meta-Analysis 2016 - Report
  • UNICEF GEROS Meta-Analysis 2016 PowerPoint [PPT] - Part 2

UNICEF GEROS Meta-Analysis 2016 Regional Summaries:



