Skip to main content
SearchLoginLogin or Signup

Exploring a Grants and Proposals Knowledge Graph

Towards an open data on the graph for representation of relationships between grant-making organizations and grant-receiving organizations (with everything in-between!)

Published onJan 10, 2022
Exploring a Grants and Proposals Knowledge Graph

I. Introduction

The grant-making process is both long and convoluted. However, these characteristics of the process lead to the generation of many data and many relationships between data that may seem disparate. This drives questions that grant-makers are interested in answering:

  • Is my organization effectively allocating funds to other organizations/people who make a high impact in our organization’s areas of interest?

  • Is our organization giving the best organizations (not necessarily the oldest, most established) a shot to do novel work?

  • Where are the areas that we could improve our processes?

    • Are there biases within the application process?

    • Do many applicants fall out of competition due to some externality?

These are just a few questions that are worth exploring as the generation of more and more data around various organizations’ grant-making processes is made accessible.

In this publication, we explore the historical and present research around constructing what we call an Open Grants Layer (OGL) that maintains data on both grant-making organizations, a grant receiving individuals or organizations, and the processes that connect the two entities like peer-review of proposals and financial metadata of competitions, to name a few.

II. The Knowledge Graph (KG)

A natural first question in our endeavor to build OGL is: what is a knowledge graph? A knowledge graph is a directed representation of labels that have defined meanings and include nodes, edges, and their respective labels.1 Nodes can represent any entity one chooses: a company, a person, a position, a demographic, and is usually connected to other nodes via an edge. An edge captures the relationship between nodes. Below is an example of a knowledge graph of fictious people who have visited various art museums in France.

A representation of a knowledge graph around France and fine art. Courtesy of Let the Machines Learn.

IIA. Early Ontologies of a Grants-Oriented Knowledge Graph

The use-case for a grants-oriented knowledge graph is not new. Many researchers, funding agencies, and workers in the social impact space are under pressure to report on the rationality and sustainability of their funding strategies. The policy has followed suit: there is even greater political pressure for data-driven assessments of non-profit organizations.

Knowledge graphs are well-suited to aid in better impact assessment, sustainability efforts, and policy decisions because of their explicit mapping of relationships between nodes that are either implicitly known to grantmakers or hidden to those allocating the money altogether. As such, there have been previous attempts at such work.

One of those previous ontologies is DINGO or the Data Integration and Extension for Grant Ontology. DINGO focuses on “the research/cultural landscape, with particular focus on the research/cultural activities and their funding.” The model is based on nine various principles that dictate its structure, most importantly its six classes: Project, Grant, Funding Agency, FundingScheme, Role, Person, Organization, and Criterion. More can be read on its website.

The DINGO Ontology, visualized.

Another implementation of a knowledge graph in the grants-making space is more productivity (i.e., patents and publications)-focused: the Academia/Industry DynAmics (AIDA) Knowledge Graph. This describes the 14 million publications and 8 million patents drawn from the Computer Science Ontology and characterizes a small subset of those products with academic affiliations or industrial sectors.

IIB. Possibilities for Expansion: Review Process, Human Capital, Detecting Bias

Although AIDA and DINGO are not the only KGs that make up the “infosphere” of possible KG solutions for the grant-making space, they both point to important areas in the grant-making space that need to be 1) combined in a creative way, and 2) expanded upon to reveal more about the grant-making process.

We have identified various places where the KG literature in grant-making can be improved, and propose to explore these in OGL. Here are but a few of our proposed explorations (non-exhaustive):

  • Establishing new ontologies of productivity in a broad sense that answers the question: What does it mean to be productive once awarded a grant that goes past an academic citation? Human capital (mentorship of people who go on to do meaningful work) can be mapped to a new KG or policy implications from research are interesting here, for example.

  • Detecting bias in review processes from geographic, ethnic/demographic, and seniority-specific information being mapped to the KG.

  • In the same view as the bullet above, shedding light on the review processes, scores, and language that goes into proposal reviews, submissions, and edits that can be mapped onto a submissions sublayer.

III. The Use of Machine Learning (ML) to Unlock Impact Modeling on the Graph

One exciting aspect of using a knowledge graph is the variety of machine learning methods that can be utilized to do meaningful, predictive work from our accumulated body of knowledge. More specifically, we’ve recognized three vital areas of what we are calling Impact Prediction Modeling:

  1. Patents and Intellectual Property

  2. Citations and Academic Literature

  3. Investments and Research “Portfolios”

In the following sections, we will outline the work that has already been done and some of the possibilities to expand or pioneer new work in the area.

IIIA. Patents and Intellectual Property

In recent work by Deng & Ma (Electronic Commerce Research, 2021), the team dedicated patent knowledge to a knowledge graph constructed of semantic information about the keywords in a patent’s domain. Using machine learning, they are able to profile patents and companies as weighted graphs-within-the-graph and generate recommendations based on a graph edit distance measure. They were able to make recommendations to companies to approach patents that would further the company’s business goals either by supplementary, complementary, or hybrid strategies.

Another piece of work by Duan & Chiang (BigSpatial ‘16) that was previously done was using a knowledge graph to predict the geographic location and time (in a certain year) that technology will emerge. This used common ML techniques such as Support Vector Machine and Multiple Hidden Markov Models on a KG with heterogeneous data sources to have high predictability amongst patent data.

Some possibilities for expansion in these areas include:

  • Using non-profit grant data to predict centers of patentable interventions or technologies across contexts, locations, and institutions receiving grants given priors from the institutions and constraints on money allocation.

  • Contest-specific patent correlations to track outcome targets for granting institutions.

IIIB. Citations and Academic Literature

Previous work in the citations and academic literature area has been extensive, but one of the more popular ML techniques that were recently published was DELPHI, the dynamic early warning by learning to predict the high impact of research using paper, author, journal, article, and network-related metrics.

DELPHI’s pipeline of data collection, structuring and computation on early warnings of impact (Weis and Jacobsen, 2021, Nature Biotechnology)

In addition, Wellcome Link has been used by the Wellcome Trust to track specific grant proposals and their academic output where Wellcome is acknowledged. However, a specific KG is not defined here, so mention-based KG construction would be useful for any marketing or portfolio analysis by interested grantmaking organizations.

Some possibilities for expansion here include:

  • Matching citations to mentions in science-based policy work, and predicting areas of impact for policy-based decisions

  • Using other metrics besides impact factor, h-score, etc., and more in line with time- and journal-independent measures like relative citation rate (RCR) to create new measures of early warning of high impact or career impact of project leaders (a la work cited in Fortunato et al., 2018 in Science)

IIIC. Investments and Research “Portfolios”

Grantmaking organizations can, more deeply, think about the money they allocate as an investment. This brings into play interesting implications for portfolio analysis and portfolio theory.

Previously looks at this aspect of knowledge construction have been done by Wellcome Trust leading the Research on Research Institute (or RoRi for short). The Wellcome team thus clusters specific subtopics and subdisciplines in various time scales to see how an organization is “balancing their portfolio” (see below for a visual example).

Other work, by Richard Klavans and Kevin W. Boyack (Jour. of Informetrics, 2017) has sought to solidify topic prominence in grant data using project-level metadata. This topic-modeling approach took on an enormous 91, 726 topics and over 58 million documents and assigned a topic to 300,000 grants using the cross-modeled data. This pointed to the interesting prominence of some topics and showed a correlation of funding per author based on that topic prominence.

Some possibilities to expound about in this area include:

  • Using portfolio analysis to allocate funds or “match-make” to fill funding gaps if grantmaking organizations cannot find viable proposals to fund

  • Make a portfolio index that is an important cross-reference for grantmaking organizations so they can optimize and diversify their portfolios.

  • Prediction of funding allocation ROI based on organization location or location of impact

IV. Using Visualizations to Build a “Catalog of Views” through sublayers

Aside from machine learning, we hope to establish a commonly used, easily accessible structure for our KG that can be easily adhered to and flexible enough to establish data trusts among participating organizations.

To do this, we must establish what we are calling sublayers. Think of a sublayer as a certain aspect of the grantmaking process you want to be put into a KG format.

Currently, we have three sublayers (with more to come!) that have generated their own visualizations which we hope will be quick visual cues to grantmaking organizations on how to best track their grantmaking processes.

IVA. The Financial Sublayer

The Financial sublayer is an amalgamation of the dollars and cents that go into grantmaking. Below is a conceptualization of the KG version of the information we could theoretically obtain:

This view has been rendered using NIH ExPORTER data in a KG that looks like so, and can be summarized in a dashboard for easy visual cues:

This was done on the popular graph-based database known as Neo4j. A NeoDash module is then made on the aggregation of this data for financial information here:

IVB. The Peer-Review Sublayer

The peer-review sublayer is an objective look at who, what, and how peer-review is done at a grantmaking organization. This can specifically be used for descriptive statistics on scoring, or for the aforementioned use to detect bias in the grantmaking process. A conceptualization of the sublayer can be seen below.

An example of the Lever For Change data across their various competitions can be modeled in Neo4j’s graph database to show that the only organizations that scored in the 95-100 range came from a single contest:

Other Sublayers, such as a Demographics sublayer, are still under construction as methodologies behind inferring demographics and obtaining public data are being explored.

V. Conclusions and Future Work

Previously presented was our ambition to take the grantmaking process and put it on what’s knowledge as a Knowledge Graph. Our methodologies will be wide-ranging in what knowledge is accumulated, using both machine learning and visualizations to key both proposal writers and grantmaking organizations into important factors that go into the awarding process. Through our use of sublayers, we hope that individual subsections of the graph will tell unique stores, but can interconnect to do impactful predictive research on the graph.

Samuel Klein:

Samuel Klein:

Link to the dataset?

Samuel Klein:

Will there be a public instance with the data for this?