Skip to main content
SearchLoginLogin or Signup

A Vision for Philanthrolytics: Visualizing Data Pipelines for MIT Solve

An early whitepaper of our vision for what Philanthrolytics can be

Published onSep 21, 2021
A Vision for Philanthrolytics: Visualizing Data Pipelines for MIT Solve
·

Introduction

As the Open Grants Commons builds relationships with different organizations, it is imperative that we show the ways in which this data can be used to form insights into the data we collect. In a previous post, we outlined the ability to commit data to knowledge graphs, making it possible to see the connections through knowledge graphs to understand how and why organizations can be successful in grant making processes. In that publication, we grappled with the question of why knowledge graphs can be useful in deriving information and underlying structures that help elucidate grants and proposals. In this publication, we go one step further, showing a more complete catalogue of views that takes the data we already have, and supplements that data with outside sources to build more informative analysis around specific processes in the grant-making process, and from the grant making organization’s point of view. 

To make our case, we use data from Torque, an API built out by Open Tech Strategies, for Lever for Change (hereby referenced as just “LFC”). We also infer what some of the data from another partner, MIT Solve (hereby referenced as just “Solve”), might look like in our catalogue of views. Links to examples are included here when possible. 

1. Questions that want to be answered: 

In brainstorming sessions with MIT Solve staff, the thematic elements that revealed themselves came from three key words: find, select, and support. We use the same verbiage here to frame our most important questions to answer, then follow with our solutions in “A catalog of views”:

“Find”

  • One of the biggest requests we get from interested organizations who have grant-awarding data is focused around the question of finding quality applicants. This could mean that we want to visualize the way that organizations are pulling applicants from around the world, or it could mean finding ways of increasing the quality of applicants. 

    • This is achieved through mapping the applicants on a worldview, so that the administrators of the competitions can view where in the world most of their applicants are coming from. 

    • This can also be achieved by seeing when the application was received. This helps in “timelining” applications across geographic regions, to make sure outreach is fair and even, while also pointing to the levels of support needed by region (if the staff of the granting organization is active in this domain of the application process).

“Select”

  • Another arena of inquiry surrounds the selection of applicants. Both Solve and LFC have wondered whether they are selecting the applicants who would benefit from their contests and services within the competitions they have. This points to either an audit of their internal processes, and/or being transparent with proposing organizations about that process in general. 

    • Other, secondary concerns come from making sure lesser-known organizations have a chance in the selection process. Visualizing this data as a first step is one of the ways we felt is important to show how we can make the process a bit more equitable.

    • Future steps might include new metric creation that can score applicants in a way that removes bias from the process, whether that be incorporating our RCR (relative citation rate)-like metric, or something else. 

“Support”

  • Is there a high typology of needs between certain organizations based on the size of that organization? Or the general mission of the organization? How can the grant-making organization best support the proposing organization, if not monetarily? 

    • One way to achieve that is to use NLP from both sides of the organizational level: what skills does the proposing organization have? What resources does the grant-making organization have? Then building a recommendation engine to “build ties” between the two organizations.

    • We use data here as a first pass to analyze application jargon, specifically looking at counts of words. This can be helpful later to slice through vocabulary that points to the needs of organizations directly

2. A catalogue of views

  1. The application process - “Find”

  • A view of the application process: Sankey diagram. This specific view, we can use by taking the information on Organization Source leads and mapping them to a shared identifying key that follows them throughout the process. LFC currently does not have this identifier, but Solve’s data allows for us to track various organizations through “rounds” of data, and thus, a shared keyspace for an organization or cohorts is possible. An interactive template can be found here.

  • A view of the applicants on a map: Both LFC and Solve have voiced interest in seeing where both funders and applicants come from. One of the easiest ways to do this is to put those funders and applicants on some sort of map. From there, the applicants can either be divided by contest or the funders and applicants can be subdivided by country and heat-mapped based on number of records (see below):

  • A view of the applicants as a timeline of submission: Outreach has been a key area for Solve in making sure that their solicitation of applicants is fair and they are informing all of their interested applicants at the same rate, or at least trying to see what geographical areas where recruiting can be increased. One way to historically look at where the application process might lag is a timeline view for various geographic regions, team compositions, or sizes (see below for example without Solve data, but how it might be mapped).

  • Other views: Viewing the “marketing funnel” of a specific contest. Below you can see example dashboards of possible sources of an applicant. 

  1. The quality application and metadata view - “Select” 

  • A view of team characteristics: Both LFC and Solve have metrics to track applicant field of study, diversity, and size of teams. Depending on the competition, optimizing for one (or multiple) of these factors can help eliminate sources of bias. Below is an example of applicants listed as “faculty” in their applications for each seasonal application cycle. 

  • A view of team proposed impact: Are all of LFC or Solve’s monetary benefits flowing to one region or one cause? A way to view these impacts is to tranche applicants by “portfolios of impact” and see where on the map they’re proposing that impact to show itself. Below is an example not using LFC or Solve data. Colors would represent different types of impact: infrastructure, education, racial equity etc.

  • A view of completeness to help the view - idea under development - Have an applicant who is stuck in the process of making their idea a reality? It might be good to track this process for “completeness” by internal metrics in some way. 

  1. Predictive impact analysis - “Support”

  • Analysis of applicant and organizational skills: By scraping LinkedIn for employee skills (if part of an organization), or by pulling other various parts of the application sections (specifically the Competition Domain under the competition-specific search function in Torque), one can get an idea of the map of skills that can be built for a specific organization. See below for example. By mapping this in a clustered way, we can get an idea for organizational focus and where the grant-seeking organization 

  • Abstract NLP and what that means for application and the contest composition: Torque/LFC’s Applicant view provides access to various aspects of an applicant’s proposal. Of interest is the access to the abstracts of most competition applicants. Using this information, we can see the relevant termed data from each application cycle. This can further set up the ability of whether jargon-bias plays a part in the process of being awarded an LFC grant.


  • In development: Predicting where the next quality applicants come from (requested by both Solve and LFC has some interest here)

  • In development: Clustering based on what is most likely to be selected for contest winner or representing winner prediction as mapped out across the application process. Solve’s multi-round processes allows for us to predict on various factors that might consider one organization an overall “winner” of the process, or we can at least predict on certain factors like venture capital most likely to be raised, etc. 

3.Obtaining the data and what we can use from the data: 

Torque, the API we use to collect data, is easily structured to support a contest-centric ontology. This is where the contest exists as a central node in the structure of the knowledge graph.

Insert here:

  • Knowledge graph; link out to knowledge graph representation

  • Write up the way we ingest Solve data.

  • Upload [to github repo] the script using the Torque API… script to knowledge graph ingestions and data table relationships

Comments
1
Samuel Klein:

link?