Participatory predictive modeling

Contents

  1. Kinds of data and analyses to think about
  2. Why take a participatory approach
  3. Collecting stakeholder generated data
  4. Aggregating collected data
  5. Participation in the analysis
  6. Annexes

1. Types of data and analyses to think about

Data sets that can be analysed by EvalC3 can come from different sources: a once-off research or evaluation exercise or from ongoing monitoring systems. And as explained below, they can also be generated by participatory means.

Two types of data can be generated by participatory means: (a) outcome data, which a good predictive model should be able to identify, (b) attribute data, some of which may be predictive of outcomes.  This form of analysis is known as a “causes of an effect” analysis. It is the default form of analysis assumed within the design of EvalC3.

2. Why take a participatory approach?

Stakeholders in a project, such as those implementing the intervention, and those experiencing its effects, are likely to have views about what works, and what does not work, which may be much wider ranging and sometimes closer to the truth, than the contents of official monitoring and evaluation systems. It is worth tapping into those views.

3. Collecting stakeholder generated data

By their own data I mean data that captures their view of what is important, and not according to some predefined perspective set up by other parties.

Pile sorting is a long-established form of ethnographic inquiry that enables us to identify participants’ view of the world, primarily the categories they use to describe their world. There are many different ways of doing pile sorting, well summarised in Harloff and Coxons, 2007 “How to Sort” guide). The method I recommend using here is a binary sort. It involves the following steps:

  1. Participants are presented with a list of events, activities, people or objects that they are familiar with. This set should be diverse but have something in common. Ideally, all the cases in the set  should be known to the respondent, at least to some degree. For example, asking someone in India to sort the top 10 cities in Uraguay would not usually make much sense.
  2. Participants are asked to sort them into two piles of any size. In doing so they are asked to focus on “What you see as the the most significant difference between these cases”. The ambit of that assessment can be defined by adding a suffix to this question. For example.
    1. “the most significant difference in terms of what they have been able to achieve”. .
    2. “the most significant difference in terms of the kind of interventions involved”.
  3. They are asked to label each pile with a description of what the items in that pile have in common, but which makes them different from the items in the other piles.
  4. They are asked to explain why that difference is significant to the respondent. What difference has it made, or might it make in future?

Pile sorting is typically done with one or more respondents in a face to face meeting. But it can be done more collaboratively in a small group, in a workshop setting. Online survey instruments, such as those available via Survey Monkey, are another means of eliciting these kinds of sorting results and judgments from participants. The use of online survey is more suitable when dealing with larger numbers of respondents and/or respondents based in many different locations. Figure 1 shows the layout of a sorting question that I have used in an online survey.

Figure 1

4. Aggregating collected data

Regardless of which method is used ( face to face or online survey) the results from all the participants’ sorts then need to be aggregated into one matrix. In that matrix  rows represent cases (i.e. sorted items) and each column represents one participants’ sorts.  The cell values of 1 and 0 in a column represent the two piles that cases can belong to. The description and explanation of those piles should be retained, to epxlain the meaning of those piles. The matrix of sort data can then be imported into EvalC3 for analysis. While the file can contain multiple attributes and multiple outcomes, once imported choices will then need to be made within the Select Data view as to which outcome is for immediate analysis and which of any of the attributes can be ignored.

5. Participation in the analysis

The best way to do this is in a workshop setting where participants can be brought together to view the data and be involved in its analysis. There are probably many ways of doing this but here are two suggestions:

  1. Ask participants to put forward their ideas about what attributes will make a difference to what outcomes. Then the workshop facilitator would input these into the Design And Evaluate view, and then explain the Confusion Matrix results. Improvements to the model could then be discussed and tested, on an iterative basis.
  2. Present to the participants the best models that the workshop facilitator and their colleagues have found to date. Then prompt a discussion about how if at all the attributes could be causally linked to the outcome. In doing so ensure a focus of the discussion where possible on real-life case examples, where these ideas could be verified as happening in practice.

6. Annexes

6.1 Other sources of information …

Hierarchical Card Sorting (HCS) A simple tool for qualitative research and inquiry, also useful for planning and evaluation. This starts with a simple binary sort, as described above.

Network visualisation of qualitative data. Combining the use of card/pile sorting and network visualisation software

Gravlee, Clarence C., Chad R. Maxwell, Aryeh Jacobsohn, and H. Russell Bernard. ‘Mode Effects in Cultural Domain Analysis: Comparing Pile Sort Data Collected via Internet versus Face-to-Face Interviews’. International Journal of Social Research Methodology 21, no. 2 (4 March 2018): 165–76. https://doi.org/10.1080/13645579.2017.1341187.

6.2 “Effects of a cause” analyses

As noted above the default kind of analysis carried out by EvalC three is a “causes of effect” type of analysis. However there is a second kind of analysis known as an “effects of cause” analysis. Here the type of data needed differ in their required frequency: (a)  the description of multiple types of outcomes, identified as present or absent in each of a set of cases, and (b) one attribute describing an expected cause of interest. This approach is especially relevant in the analysis of decentralised programs where the range of outcomes from an intervention can be large, and include the expected and unexpected, but in some cases also possibly arising from other causes.

Data collection via card sorting is essentially the same as described above. But there is likely to be a focus on a greater number of outcomes and a smaller number of attributes (/potential causes of those outcomes). The data matrix would take the same basic form.

However once the data is imported into EvalC3 it needs to be treated differently within the Select Data view. The structure of the imported matrix needs to be edited. All the outcomes need to be re-labelled as attributes and all the attributes need to be relabelled as outcomes. This will then mean the software will require an analytic focus on one attribute as a known cause and multiple outcomes as the possible effects. This approach is possible because the software is agnostic as to what is categorised as attribute and outcome, there is no inbuilt assumptions about the direction of time or causality.

Goertz, Gary, and James Mahoney. ‘Causes-of-Effects versus Effects-of-Causes’, 2012. https://doi.org/10.23943/princeton/9780691149707.003.0003.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.