Methodology
Interested in our methodology? You’re in the right place
The FarmFit Insights Hub is based on a unique dataset and analytical approach. It is important to us that we are transparent in how we collect data, which data we collect, how we manage it, which analyses we conduct and how we do so, what quality control processes we have, and more. This page provides you with an overview of our methodology.
We welcome your comments, suggestions, and queries through our contact form as we continue to improve the quality and relevance of the FarmFit Insights Hub.
What we designed our methodology to do
To answer the
Our data and analytics team needs to address the following requirements:
- Relevance: We capture the right data (qualitative and quantitative) from each model that we analyze. This means collecting both the right types of data points and the right amount of data.
- Credibility: Our data is of sufficient
quality and allows us to conduct analyses with enoughrigor to make meaningful contributions with our findings. - Comparability: Our data is comparable across various models. It is measured in a standardized way so that data and insights can be compared at an aggregate level.
- Actionable: We have enough data that meets the previous three criteria to allow us to generate insights that are relevant and practical to the sector
We’ve made tremendous strides in addressing each of these requirements. While there’s always room for improvement, and as we continue to improve and evolve, we believe our approach and methodology provide a solid foundation towards meeting these requirements.
Our unique approach
To meet our requirements, we’ve designed an approach that is centered on the principles of standardization and modularization, applied throughout the data journey lifecycle. We have organized our approach into five areas, which are described below:
Our Learning Framework provides us with a strategic direction and guides all our work. It is made up of a set of structured research questions that details our overarching question, sub questions and hypotheses. These are translated to metrics and data collection requirements.
The Learning Framework guides our selection of business models to analyze, sets out the learning goals for individual engagements and ensures that our specialized teams of Inclusive Business Model Analysts, Innovation Managers, Technical Assistance managers, Data Professionals and Domain Experts can work with a high degree of interoperability.
We continually refine data collection procedures, interview guides, modelling approaches, and internal learning and collaboration processes. We collect data from multiple sources and in a variety of formats. These range from desk-based literature review, company document review, focus group interviews, company financial data, key person interviews, and farm-level household surveys.
Our analysts conducting the inclusive business model analyses undergo extensive training and follow an internal graduation model to ensure high quality and conformity when executing analyses based on our methodology.
Each inclusive business model analysis is a one-time assessment. To better understand causality and the mechanisms of change over time, we collect additional data and information during the Technical Assistance phase of a subset of these models. In future, we will also conduct a follow-up inclusive business model analysis of some models. We will revisit them several years after our initial analysis in order to refine and validate our findings. These follow-up analyses will also allow us to compare individual business models at two different points in time.
Public versions of all our business model assessments can be found on our Resources page.
All quantitative and qualitative data collection happens in a semi-standardized manner via forms and templates, and is collected largely by internal FarmFit staff. In some cases, data collection is complemented by partners and external consultants. Data is organized in line with our Learning Framework facilitating high quality and relevant analyses.
You can see a copy of our Indicator template, which collects key raw data from our business model analyses.
Our farmer survey question library provides the template for collecting standardised data across our inclusive business model analyses and TA engagements. For more information on our farmer survey data, please read our publication, Lessons Learned: How to use farmer data effectively and learn how we conduct our surveys.
These semi-standardized data collection methods are the foundation that allows us to compare information at an aggregate level. As each data point is defined, collected and calculated in a common way, information can be compared and aggregated.
For our inclusive business model data, we collect quantitative data at the lowest level of dis-aggregation feasible and conduct all aggregations and KPI calculations via our calculation engine. This ensures all calculations are carried out to strict specifications, with exceptions handled in a consistent manner. This ensures that high quality data collection is not undermined by inconsistent downstream calculations.
In total, we’ve automated approximately 60,000 calculations that would otherwise need to be manually verified for consistency.
Our approach to collecting qualitative data includes narrative reports, interviews, focus groups and observations. We collect qualitative data both during our standardized business model assessment and continuously, over a subset of projects through which we provide technical assistance.
Our quantitative analyses are conducted using R statistical software and tracked with version control. Conducting reproducible analyses aids us in efficiently iterating our approach in an environment where there are many moving parts in methodology development, data quality improvements, increasing data availability and a complex set of interconnecting analyses.
Most of the qualitative data collected in the course of our inclusive business model analyses and TA engagements is coded against the outcomes and drivers in our Learning Framework. This allows for the systematic analysis of qualitative data to produce insights that can be used to explain, guide and cross reference against our quantitative analyses. We use this qualitative data to explain and validate our analyses in the Insights Hub, and to provide real, practical examples to illustrate our findings.
We expand further on the approach used in the Insights Hub below.
Creating our knowledge products, and this Insights Hub, specifically, has included multiple rounds of internal and external review, including input and validation from external advisors and key partners.
We’ve paid special attention to deliver complex and interconnected ideas in several formats that assist users to quickly find what they are looking for.
With this Insights Hub, we’ve adopted novel and user-friendly ways of condensing large amounts of information. In contrast to linear reports that have a predetermined order and are the same for every reader, our Insights Hub is designed to be interactive. Readers can pick and choose the information they want to see, dive deeper or skip ahead as desired, while balancing narrative and data-driven methods of analysis. We will continue to update this website with new insights, advice and guidance.
Analytics used in the Insights Hub
Besides the overview of our broader approach that applies to all our work at IDH FarmFit, here we zoom in on the approach used to develop the analyses for the Insights Hub.
All the data that we collect and manage is organized into different categories. The three most important ones for the Insights Hub are:
- Outcome indicators: Here, we look at three outcomes: service delivery cost per farmer per year, direct cost recovery, and farmer value created. We analyze outcomes across these three change as a result of context and business model design. In the future, we may expand the number of outcome indicators that we look at to include environmental outcomes, net investment per farmer, investment per farmer as a percentage of sourcing costs, and more. Click here to read more about the methodology behind each of the three outcome indicators:
Service Delivery Cost per Farmer, Direct Cost Recovery from Services, andFarmer Value Created. - Contextual drivers: These are variables that companies do not control that may influence their business models. Understanding how these drivers influence business model outcomes can help identify unique challenges and opportunities for different contexts and allow businesses, investors and others to design business models suitable to the context in which they operate.
- Design drivers: These are variables that companies control. More specifically, design drivers refer to how the smallholder inclusive business model is designed. They are decisions that a company makes when designing how their business will operate, such as what services to offer, how to deliver them and whether or not to charge for services. Understanding the relationships between design drivers and outcomes can provide businesses, investors and others with insights to help optimize smallholder inclusive business models for success.
The core of the Insights Hub is to understand the interrelationships between these three sets of data.
In developing our insights, we first ran exploratory data analyses (EDAs) on approximately 50 contextual and design drivers. Our Learning Framework was the basis from which we prioritized different drivers and informed subsequent EDA. We used a variety of EDA techniques, ranging from basic descriptive statistics, and visuals that show distributions and various other relationships, to more complex tools such as correlation matrices. The intent was to reveal any hidden structures, detect outliers and anomalies, and understand the relationships between different drivers and outcomes.
We extended our EDA with the aim of understanding factors that explain variations in outcomes. We have used several methods to find promising drivers. Our use of tests varied depending on the number of data points, type of data, distribution of outcomes, and other relevant factors. For instance, to find signals between categorical drivers and a numeric outcome with skewed distribution, we used Kruskal-Wallis and post-hoc Wilcoxon tests. To find signals between categorical drivers and normally-distributed numeric outcomes, we used one-way and two-way Analysis of Variance (ANOVA). To find signals between a mixed set of drivers and a numeric outcome, we use linear regression. This multifaceted approach has helped us to take a better look at promising drivers that warranted further in-depth analysis. We prioritized 15 drivers for further analysis from this exercise.
Our first set of Insights Hub analyses is focused on the five drivers that have the most important insights in terms of the strength of the data.
For each driver, we conducted five types of analyses:
Driver versus outcome analysis Multiple drivers versus an outcome Machine Learning validation Qualitative insights Validation
Confidence ratings
The FarmFit
Insights Hub is meant as a credible resource providing data-informed
analyses and insights. In order to be credible and for users of the Hub
to be able to best use our data and insights, we are transparent about
the confidence in, and limitations of, the data that we use and the
insights that we derive. In the Insights Hub, we use two indicators of our confidence in data and insights:
Strength of Relationship: Examines the statistical relationship between a driver and an outcome variable. A score is given out of five based on the statistical significance of a relationship; the consistency of our findings across different analytical approaches employed; generalizability of the data; and whether there are any limitations with regards to the underlying data points used (e.g., subjectivity).
Confidence in our Findings: This indicator focuses on the causal explanations of the relationship between drivers and outcome variables. A score is given out of five based on the breadth, depth and consistency of the evidence used (both quantitative and qualitative); whether the explanation is corroborated (or contradicted) by third party literature; and the level of transferability/generalizability.
External Advisory Group
We are grateful to have the support of a group of external advisors who have provided us with valuable input during the process. These 10 external advisors represent our different target audiences, including global and local off-takers, consumer-facing brands, donors, research/academia, social enterprises and (impact) investors. They have provided us with feedback throughout the process of developing the Insights Hub.
Notes and Limitations
Given our objectives and in keeping the above approach in mind, there are, of course, limitations to our findings. These have been noted throughout the Insights Hub where relevant.
Several general limitations are noted here.
Inclusive business model sample
Our database contains data of more than 100 inclusive business models, making it the largest database of its kind that we know of. Although this surpasses the number of models usually included in analyses that we and others have done the past, the main limitation is whether insights from this sample size can be relevant to the full number of inclusive business models currently in existence. Nonetheless, we are confident that our sample size is not only unique in scale; it is also sufficient to begin to draw valuable insights and analyses, even if some of these insights may need further illumination.
Sample biases
Our sample of inclusive business models contains the following non-exhaustive biases:
- Most of the models are located in Africa
- Value chains are mostly limited to that which IDH focuses on
- Models are concentrated in English-speaking countries
- We conduct analyses with companies who are willing and/or able to participate and based on their willingness or ability to co-invest in the analyses and/or TA projects
- A screening process will filter out companies who don’t meet a threshold of maturity
- We mainly choose (explicitly or implicitly) as our private sector partners companies that are interested in achieving commercial viability. This may not be representative of all models out there
Small subsets
When we begin to consider smaller subsets of data by grouping or filtering inclusive business models using certain characteristics, the sample numbers fall quite dramatically. For instance, when there are only five models included in a group, how much can we really say about that group?
One mitigating measure is that employing machine learning techniques can help in isolating the effect of variables, all other factors remaining equal. Another mitigating factor is that our database will continue to grow over time.
Projected performance
Many inclusive business model analyses include forward-looking projections of performance. The models are often based on a few years of historical data, which are projected a few years into the future to determine the business case for possible intervention.
With this in mind, it is important to note that the indicators we use for our quantitative analyses are not solely based on measured results.
Projections are done both for company and farmer performance. Company performance data often contains a few years of historical data to base future assumptions on, but for farmer performance, this data is much harder to obtain and the expected results are more difficult to model.
For this reason, we treat expectations of farm-level performance with more caution and a higher degree of uncertainty.
One way in which we seek to mitigate the limitations of using projected data is to conduct a second business model analysis of a selection of the inclusive business models we have already analyzed. We use this data to better understand how the initial analyses fared against what happened. We expect to conduct the first of these follow up analyses in late 2023.
Our richest source of qualitative data comes from 17 companies, which are a subset of the 100+ business models that we have analyzed till date. We work closely with these 17 companies and provide them with technical assistance. This helps us generate in-depth insights related to the how of service delivery. However, the 17 companies to whom we provide technical assistance are not a representative sample of all the 100+ business models analyzed. For instance, these 17 companies are mainly active in food crops and loose value chains, meaning that we have fewer qualitative insights on cash crop and tight value chain business models. Similarly, only 3 of the 17 are global off-takers, meaning we have limited insights on these types of companies.
Software citations
We used several open-source software packages in our quantitative analyses:
Allaire, JJ. 2022. quarto: R Interface to “Quarto” Markdown Publishing System. https://CRAN.R-project.org/package=quarto.
Cengic, Mirza. 2023. farmfitR: IDH Farmfit Internal Package.
Greenwell, Brandon, Bradley Boehmke, Jay Cunningham, and GBM Developers. 2022. gbm: Generalized Boosted Regression Models. https://CRAN.R-project.org/package=gbm.
Kuhn, Max, and Hadley Wickham. 2020. Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles. https://www.tidymodels.org.
Landau, William Michael. 2021a. tarchetypes: Archetypes for Targets.
Landau, William Michael. 2021b. “The Targets r Package: A Dynamic Make-Like Function-Oriented Pipeline Toolkit for Reproducibility and High-Performance Computing.” Journal of Open Source Software 6 (57): 2959. https://doi.org/10.21105/joss.02959.
Masterson, Gavin, Andrew B. Collier, and Megan Beckett. 2023. farmfit: A Package for Developing and Maintaining the IDH Farmfit Database.
Molnar, Christoph, Bernd Bischl, and Giuseppe Casalicchio. 2018. “iml: An r Package for Interpretable Machine Learning.” JOSS 3 (26): 786. https://doi.org/10.21105/joss.00786.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
R Special Interest Group on Databases (R-SIG-DB), Hadley Wickham, and Kirill Müller. 2022. DBI: R Database Interface. https://CRAN.R-project.org/package=DBI.
Ushey, Kevin. 2022. renv: Project Environments. https://CRAN.R-project.org/package=renv.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2022. rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Arnold, Jeffrey B. 2021. ggthemes: Extra Themes, Scales and Geoms for “ggplot2”. https://CRAN.R-project.org/package=ggthemes.
Blondel, Emmanuel. 2023. rsdmx: Tools for Reading SDMX Data and Metadata.
Chang, Winston, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. 2022. shiny: Web Application Framework for r. https://CRAN.R-project.org/package=shiny.
Cheng, Joe, Carson Sievert, Barret Schloerke, Winston Chang, Yihui Xie, and Jeff Allen. 2022. htmltools: Tools for HTML. https://CRAN.R-project.org/package=htmltools.
Clarke, Erik, Scott Sherrill-Mix, and Charlotte Dawson. 2022. ggbeeswarm: Categorical Scatter (Violin Point) Plots. https://CRAN.R-project.org/package=ggbeeswarm.
Cuilla, Kyle. 2022. reactablefmtr: Streamlined Table Styling and Formatting for Reactable. https://CRAN.R-project.org/package=reactablefmtr.
Firke, Sam. 2021. janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.
Gohel, David, and Panagiotis Skintzos. 2023. ggiraph: Make “ggplot2” Graphics Interactive. https://CRAN.R-project.org/package=ggiraph.
Klik, Mark. 2022. fst: Lightning Fast Serialization of Data Frames. https://CRAN.R-project.org/package=fst.
Lin, Greg. 2023. reactable: Interactive Data Tables for r. https://CRAN.R-project.org/package=reactable.
Maechler, Martin, Peter Rousseeuw, Anja Struyf, Mia Hubert, and Kurt Hornik. 2022. cluster: Cluster Analysis Basics and Extensions. https://CRAN.R-project.org/package=cluster.
Müller, Kirill. 2020. here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.
Ooi, Hong. 2023. Microsoft365R: Interface to the “Microsoft 365” Suite of Cloud Services. https://github.com/Azure/Microsoft365R https:/github.com/Azure/AzureR.
Ooms, Jeroen. 2023. writexl: Export Data Frames to Excel “xlsx” Format. https://CRAN.R-project.org/package=writexl.
Pedersen, Thomas Lin. 2022. patchwork: The Composer of Plots. https://CRAN.R-project.org/package=patchwork.
Schloerke, Barret, Di Cook, Joseph Larmarange, Francois Briatte, Moritz Marbach, Edwin Thoen, Amos Elberg, and Jason Crowley. 2021. GGally: Extension to “ggplot2”. https://CRAN.R-project.org/package=GGally.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.
Silge, Julia, and David Robinson. 2016. “tidytext: Text Mining and Analysis Using Tidy Data Principles in r.” JOSS 1 (3). https://doi.org/10.21105/joss.00037.
Skinner, Benjamin. 2021. duawranglr: Securely Wrangle Dataset According to Data Usage Agreement. https://CRAN.R-project.org/package=duawranglr.
Wickham, Hadley, Max Kuhn, and Davis Vaughan. 2022. generics: Common S3 Generics Not Provided by Base r Methods Related to Model Fitting. https://CRAN.R-project.org/package=generics.
Wickham, Hadley, and Dana Seidel. 2022. scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.
Xie, Yihui. 2014. “knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
Xie, Yihui. 2022. knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.
Yutani, Hiroaki. 2022. gghighlight: Highlight Lines and Points in “ggplot2”. https://CRAN.R-project.org/package=gghighlight.