Interested in our methodology? You’re in the right place

The FarmFit Insights Hub is based on a unique dataset and analytical approach. It is important to us that we are transparent in how we collect data, which data we collect, how we manage it, which analyses we conduct and how we do so, what quality control processes we have, and more. This page provides you with an overview of our methodology.

We welcome your comments, suggestions, and queries through our contact form as we continue to improve the quality and relevance of the FarmFit Insights Hub.

What we designed our methodology to do

To answer the overarching question for this Insights Hub, we need data from inclusive business model analyses that are relevant, credible, comparable and actionable. The methodology that we have developed and tested since 2015  is designed to meet each of these criteria.

Our data and analytics team needs to address the following requirements:

  1. Relevance: We capture the right data (qualitative and quantitative) from each model that we analyze. This means collecting both the right types of data points and the right amount of data.
  2. Credibility: Our data is of sufficient quality and allows us to conduct analyses with enough rigor to make meaningful contributions with our findings.
  3. Comparability: Our data is comparable across various models. It is measured in a standardized way so that data and insights can be compared at an aggregate level.
  4. Actionable: We have enough data that meets the previous three criteria to allow us to generate insights that are relevant and practical to the sector

We’ve made tremendous strides in addressing each of these requirements. While there’s always room for improvement, and as we continue to improve and evolve, we believe our approach and methodology provide a solid foundation towards meeting these requirements. 

Our unique approach

To meet our requirements, we’ve designed an approach that is centered on the principles of standardization and modularization, applied throughout the data journey lifecycle. We have organized our approach into five areas, which are described below:

Analytics used in the Insights Hub

Besides the overview of our broader approach that applies to all our work at IDH FarmFit, here we zoom in on the approach used to develop the analyses for the Insights Hub. 

All the data that we collect and manage is organized into different categories. The three most important ones for the Insights Hub are:

  1. Outcome indicators: Here, we look at three outcomes: service delivery cost per farmer per year, direct cost recovery, and farmer value created. We analyze outcomes across these three change as a result of context and business model design. In the future, we may expand the number of outcome indicators that we look at to include environmental outcomes, net investment per farmer, investment per farmer as a percentage of sourcing costs, and more. Click here to read more about the methodology behind each of the three outcome indicators: Service Delivery Cost per Farmer, Direct Cost Recovery from Services, and Farmer Value Created.
  2. Contextual drivers: These are variables that companies do not control that may influence their business models. Understanding how these drivers influence business model outcomes can help identify unique challenges and opportunities for different contexts and allow businesses, investors and others to design business models suitable to the context in which they operate.
  3. Design drivers: These are variables that companies control. More specifically, design drivers refer to how the smallholder inclusive business model is designed. They are decisions that a company makes when designing how their business will operate, such as what services to offer, how to deliver them and whether or not to charge for services. Understanding the relationships between design drivers and outcomes can provide businesses, investors and others with insights to help optimize smallholder inclusive business models for success.

The core of the Insights Hub is to understand the interrelationships between these three sets of data.

In developing our insights, we first ran exploratory data analyses (EDAs) on approximately 50 contextual and design drivers. Our Learning Framework was the basis from which we prioritized different drivers and informed subsequent EDA. We used a variety of EDA techniques, ranging from basic descriptive statistics, and visuals that show distributions and various other relationships, to more complex tools such as correlation matrices. The intent was to reveal any hidden structures, detect outliers and anomalies, and understand the relationships between different drivers and outcomes.

We extended our EDA with the aim of understanding factors that explain variations in outcomes. We have used several methods to find promising drivers. Our use of tests varied depending on the number of data points, type of data, distribution of outcomes, and other relevant factors. For instance, to find signals between categorical drivers and a numeric outcome with skewed distribution, we used Kruskal-Wallis and post-hoc Wilcoxon tests. To find signals between categorical drivers and normally-distributed numeric outcomes, we used one-way and two-way Analysis of Variance (ANOVA). To find signals between a mixed set of drivers and a numeric outcome, we use linear regression. This multifaceted approach has helped us to take a better look at promising drivers that warranted further in-depth analysis. We prioritized 15 drivers for further analysis from this exercise.

Our first set of Insights Hub analyses is focused on the five drivers that have the most important insights in terms of the strength of the data.

For each driver, we conducted five types of analyses:

  1. Driver versus outcome analysis
  2. Multiple drivers versus an outcome
  3. Machine Learning validation
  4. Qualitative insights
  5. Validation

Confidence ratings

The FarmFit Insights Hub is meant as a credible resource providing data-informed analyses and insights. In order to be credible and for users of the Hub to be able to best use our data and insights, we are transparent about the confidence in, and limitations of, the data that we use and the insights that we derive. In the Insights Hub, we use two indicators of our confidence in data and insights:

Strength of Relationship: Examines the statistical relationship between a driver and an outcome variable. A score is given out of five based on the statistical significance of a relationship; the consistency of our findings across different analytical approaches employed; generalizability of the data; and whether there are any limitations with regards to the underlying data points used (e.g., subjectivity).

Confidence in our Findings: This indicator focuses on the causal explanations of the relationship between drivers and outcome variables. A score is given out of five based on the breadth, depth and consistency of the evidence used (both quantitative and qualitative); whether the explanation is corroborated (or contradicted) by third party literature; and the level of transferability/generalizability.

External Advisory Group

We are grateful to have the support of a group of external advisors who have provided us with valuable input during the process. These 10 external advisors represent our different target audiences, including global and local off-takers, consumer-facing brands, donors, research/academia, social enterprises and (impact) investors. They have provided us with feedback throughout the process of developing the Insights Hub.

Notes and Limitations

Given our objectives and in keeping the above approach in mind, there are, of course, limitations to our findings. These have been noted throughout the Insights Hub where relevant.

Several general limitations are noted here.

Inclusive business model sample

Our database contains data of more than 100 inclusive business models, making it the largest database of its kind that we know of. Although this surpasses the number of models usually included in analyses that we and others have done the past, the main limitation is whether insights from this sample size can be relevant to the full number of inclusive business models currently in existence. Nonetheless, we are confident that our sample size is not only unique in scale; it is also sufficient to begin to draw valuable insights and analyses, even if some of these insights may need further illumination.

Sample biases

Our sample of inclusive business models contains the following non-exhaustive biases:

  1. Most of the models are located in Africa
  2. Value chains are mostly limited to that which IDH focuses on
  3. Models are concentrated in English-speaking countries
  4. We conduct analyses with companies who are willing and/or able to participate and based on their willingness or ability to co-invest in the analyses and/or TA projects
  5. A screening process will filter out companies who don’t meet a threshold of maturity
  6. We mainly choose (explicitly or implicitly) as our private sector partners companies that are interested in achieving commercial viability. This may not be representative of all models out there

Small subsets

When we begin to consider smaller subsets of data by grouping or filtering inclusive business models using certain characteristics, the sample numbers fall quite dramatically. For instance, when there are only five models included in a group, how much can we really say about that group?

One mitigating measure is that employing machine learning techniques can help in isolating the effect of variables, all other factors remaining equal. Another mitigating factor is that our database will continue to grow over time.

Projected performance

Many inclusive business model analyses include forward-looking projections of performance. The models are often based on a few years of historical data, which are projected a few years into the future to determine the business case for possible intervention.

With this in mind, it is important to note that the indicators we use for our quantitative analyses are not solely based on measured results.

Projections are done both for company and farmer performance. Company performance data often contains a few years of historical data to base future assumptions on, but for farmer performance, this data is much harder to obtain and the expected results are more difficult to model.

For this reason, we treat expectations of farm-level performance with more caution and a higher degree of uncertainty. 

One way in which we seek to mitigate the limitations of using projected data is to conduct a second business model analysis of a selection of the inclusive business models we have already analyzed. We use this data to better understand how the initial analyses fared against what happened. We expect to conduct the first of these follow up analyses in late 2023.

Our richest source of qualitative data comes from 17 companies, which are a subset of the 100+ business models that we have analyzed till date. We work closely with these 17 companies and provide them with technical assistance. This helps us generate in-depth insights related to the how of service delivery. However, the 17 companies to whom we provide technical assistance are not a representative sample of all the 100+ business models analyzed. For instance, these 17 companies are mainly active in food crops and loose value chains, meaning that we have fewer qualitative insights on cash crop and tight value chain business models. Similarly, only 3 of the 17 are global off-takers, meaning we have limited insights on these types of companies.

Software citations

We used several open-source software packages in our quantitative analyses: