We’re building a new UNICEF.org.
As we swap out old for new, pages will be in transition. Thanks for your patience – please keep coming back to see the improvements.

Evaluation database

Evaluation report

2018 EO: GEROS Meta-Analysis 2017

Author: Tom Orrell; Ada Ocampo

Executive summary

UNICEF GEROS Meta-Analysis 2017. An independent review of UNICEF evaluation report quality and selected trends in 2017


This review is a meta-analysis of the quality of the evaluation reports submitted to UNICEF’s Global Evaluation Reports Oversight System (GEROS)  during 2017. It synthesizes results of 88 evaluation reports, reviewed for quality by an independent team according to UNICEF and UN-SWAP standards; and shares findings on a global level, as well as highlighting trends across regions, sectors trends and quality assessment criteria. This report contributes to a wider body of knowledge in addition to integrating requirements for reporting on the UN-SWAP evaluation performance indicator.


The purpose of the meta-analysis is to contribute to achieving the three overall objectives of GEROS (particularly objective 1), of which the meta analysis is only one part :
Objective 1: Enabling environment for senior managers and executive board to make informed decisions based on a clear understanding of the quality of evaluation evidence and usefulness of evaluation reports;
Objective 2: Feedback leads to stronger evaluation capacity of UNICEF and partners;
Objective 3: UNICEF and partners are more knowledgeable about what works, where and for whom.


This meta-analysis was conducted in March-June 2018 once all of the evaluation reports for 2017, had been assessed, submitted to UNICEF EO and accepted. Quantitative data was compiled regarding scores for different aspects of the reports using Excel. Analysis was carried out across multiple axes:

  • Regional trends (regional and country levels)
  • Trends by quality assessment criteria (including across time)
    Object of the evaluation; Evaluation purpose, objectives and scope; Evaluation methodology; Findings; Conclusions and lessons learned; Recommendations; Evaluation principles (gender, human rights and equity); Report structure, logic and clarity; Executive summary
  • Type of management arrangements for the evaluation
  • Purpose
  • Scope
  • Results level
  • Strategic Plan Objective Area correspondence
  • UN-SWAP performance and trends

The comments made by reviewers on each evaluation quality assessment were filtered according to section and overall ratings, and then synthesized to identify common themes and thus explore any causal links between recurrent issues and particular ratings. In addition the reviews were trawled to identify good evaluation practices from the reports. Quantitative and qualitative data were triangulated, and compared with longitudinal data on findings from four previous years to map key trends and patterns.

Findings and Conclusions:

Overall, the proportion of reports meeting UNICEF standards has been maintained since the previous year; the majority of reports (72%) fully met UNICEF evaluation report standards. Of these, 15% of reports were rated as highly satisfactory, which is a substantial improvement over the 6% achieving this standard in the previous year; and 57% were rated as satifactory. No report was fully unsatisfactory. The remaining 28% of reports were rated as ‘fair’, meaning that they can be used with caution by taking account of their limitations.
The evaluation quality assurance tool used in 2016 and 2017 allows for additional levels of disaggregation beyond the five main classifications (highly satisfactory, satisfactory, fair, unsatisfactory, missing). Deeper analysis of ratings (see Figure 3) reveals that the majority of evaluation reports reaching UNICEF standards are in the lower-band of the satisfactory rating. Furthermore, 15% of reports, an increase on 9% in the previous year, were rated in the lower band of ‘fair’; which indicates that the pattern of increasing quality is not assured, but needs continuous strengthening of the evaluation function.
Seven evaluation reports were rated at the very upper end of the ‘satisfactory’ range, with a mean score (x) of 3.3≤x<3.5 out of a maximum of 4 points (reports scoring 3.5 or over are rated as ‘highly satisfactory’). With improvements in one or two sections, these reports are likely to have been rated ‘highly satisfactory’.
The strongest aspects of evaluation reports in 2017 were ‘purpose, objectives and scope’, ‘structure’ and ‘recommendations’. This is consistent with previous years. ‘Findings’ were also rated relatively strongly. These elements are all key contributors to utility of evaluations. By contrast, evaluation principles (HRBAP and gender equality), lessons learned and methods sections received the lowest ratings. These results indicate that evaluations are focusing on utility for primary intended users. More attention needs to be focused on strengthening credibility and learning. This need also reflects 2015 and 2016 evaluation trends.
Comparison of evaluation reports across the UNICEF regions reveals a reduction in the number of reports from ESAR (normally the largest region for evaluation), but an improved average quality in the reports submitted. ROSA and EAPR also produced slightly fewer reports, but improved in terms of the propotion of reports rating as satisfactory. By comparison, a large increase in the number of reports from ECAR, with 7 of these rated as ‘highly satisfactory’ (1 in 2016) – although also 3 ‘fair’ reports unlike in 2016. WCAR increased the number of reports, but the ‘additional’ evaluations were rated ‘fair’, LACR also saw an increase in the number and percentage of ‘fair’ evaluation reports, while MENA one more report than in 2016 was also rated ‘highly satisfactory’. The main difference relates to HQ, with a significant reduction in the number of corporate evaluations being completed in 2017.
The largest body of evaluative knowledge was generated for health and education, followed by child protection and social inclusion (see figure 17). These same areas were also most covered in 2016 evaluation reports. The priority areas for action to improve all reports are similar, with weaknesses in the articulation of human-rights based approaches (HRBAP), gender equality, ethics, and lessons learned. As expected, evaluations that successfully mainstreamed gender equality as a cross-cutting theme were also strongest regarding HRBAP and equity.
The aggregated average UN-SWAP score for integration of gender equality in 2017 was 6.15, which is classified as Approaching Requirements. This is almost the same as the 2016 and 2015 cycle, suggesting that fully mainstreaming gender equality within the evaluation system remains a challenge. The priority for action to improve UN-SWAP remains to ensure gender analysis is used to inform evaluation findings, conclusions and recommendations.
Most evaluations (82%) are managed directly by UNICEF. Of these, 76% were rated as fully meeting UNICEF standards. These are almost identical patterns to 2016. Once again, in 2017 there were no purely quantitative evaluations; most evaluations used mixed methods, with 17% of evaluations purely qualitative (19% in 2016). Unlike in previous years, however, there was no difference in the quality of reports between these methodological approaches.
2017 saw a large increase in the number of quasi-experimental evaluations. Project evaluations improved in both quality and number; as did strategy evaluations. Country programme evaluations and joint programme evaluations both reduced in number, but retained exactly the same level of quality as 2016. By comparison, programme and pilot/innovation evaluations reduced in both number and quality.

Conclusion 1: UNICEF evaluation reports in 2017 have maintained the quality and coverage of the previous year, while being fewer in number but less strategic in scope due to an increase in the number and proportion of project evaluations.
Conclusion 2: While the integration of human rights based approaches and gender equality commitments continues to improve over time, the pace of this change is insufficient to meet UNICEF targets, including for UN-SWAP.
Conclusion 3: Inconsistency in the inclusion and quality of lessons learned has important implications on both the quality assessment of reports and the utility of evaluations.


Recommendation 1: In aligning the evaluation function to the UNICEF Strategic Plan 2018-2021, incentivize and support the use of more strategic evaluations by re-focusing away from project and output-level evaluations.
Recommendation 2: To ensure that no child is left behind and to deliver on the UNICEF equity agenda, initiate urgent action to overcome persistent bottlenecks and to strengthen the full integration of HRBAP, equity and UN-SWAP requirements in all evaluations using UN Evaluation Group guidance and good practices.
Recommendation 3: Reassess the integration of gender, human-rights and equity indicators within the GEROS assessment tool, with a view to generating more detailed insights on the bottlenecks to delivering UNICEF commitments.
Recommendation 4: Clarify UNICEF standards regarding which types of evaluations are required to include lessons learned, and facilitate knowledge exchange to better support the development and sharing of lessons.

Full report in PDF

PDF files require Acrobat Reader.



Report information


Evaluation Office



Management Excellence (Cross-cutting)



Sequence #:

New enhanced search