Skip to main content
SearchLoginLogin or Signup

Keyword Analysis, UMAP Lexical Discovery, and LLMs: What themes arise in mental health grant proposals?

In this publication, we develop methodologies for analyzing keywords in grant proposals across three organizations: Wellcome Trust, Lever for Change, and MIT Solve. We use a previously established framework by Homeworld Collective.

Published onAug 30, 2023
Keyword Analysis, UMAP Lexical Discovery, and LLMs: What themes arise in mental health grant proposals?
key-enterThis Pub is a Supplement to

Colab notebook here.


A way to enable trend-finding in grant making is to look at the words that are being used in applicant project descriptions. Using a database of applicants that span three organizations, extend through 5 years and equal approximately 500+ awards, we are able to use keyword search and analysis to form some ideas about the types of projects that were granted philanthropic dollars in the last 5 years. The following presents some preliminary results into keyword analysis, using a lexicon of mental health-related keywords and institutional-related keywords, both generated by LLMs. In addition, embeddings of the project descriptions were made and mapped on a UMAP to find similar lexical clusters, predict the category of various project descriptions (via LLM classification), and do other advanced analyses.

Establishing keyword lists for mental health projects

To limit the scope of our analysis, we establish a keyword list with help from ChatGPT, academic literature, and experts in the fields. The process for generating and establishing these keywords related to "mental health" involved a comprehensive review of existing literature, research, and common terminology within the field.

Additionally, online databases, academic publications, and reputable mental health organizations were referenced to identify prevalent keywords associated with mental health topics. The keywords were chosen to encompass various aspects, including various mental health conditions, treatment approaches, support mechanisms, and societal factors. This iterative process aimed to create a well-rounded and versatile list that could be used effectively for searching and categorizing information within the mental health domain.

keywords_mental_health = ["depression","anxiety","bipolar disorder",
                          "schizophrenia", "stress", "PTSD", "mental illness",
                          "therapy", "counseling", "psychotherapy", "cognitive-behavioral therapy (CBT)",
                          "psychiatric medications", "self-care", "wellness",
                          "mindfulness","meditation","emotional well-being", "mental health stigma",
                          "support groups", "mental health awareness", "suicide prevention", "mental health services",
                          "mental health resources", "psychologist", "psychiatrist", "social worker", "mental health advocate",
                          "coping strategies", "mental health education", "mental health assessment" ]

Another access of search included looking through keywords that have to do with institutional subject areas:

institutional_areas_keywords = ["administration", "academics", "research", "finance", "human resources", 
                                "marketing", "communications", "public relations", "information technology", 
                                "logistics", "facilities management", "legal", "enrollment", "student affairs", 
                                "alumni relations", "outreach", "diversity and inclusion", "governance", "compliance", 
                                "development", "planning", "strategy", "procurement", "security", "infrastructure", 
                                "health services", "athletics", "library services", "operations", "risk management", 
                                "community engagement", "global initiatives", "sustainability", "customer service", 
                                "quality assurance", "training", "graduate programs", "undergraduate programs", 
                                "public affairs", "corporate relations", "laboratory management", "online education", 
                                "extension services", "publishing", "innovation", "international relations", "partnerships", 
                                "curriculum development", "institutional research", "project management"]

Incorporating Wellcome Trust data into our applicant data pool

In order to get a better idea of what the universe of grant applications from mental health-related grantee organizations looks like, we incorporated Wellcome Trust’s data into our already-growing base of data. To do this, we looked at the Excel sheet on their website and filtered the data based on our keywords in a regex generated from the above list. The result was adding 400+ awardees to our applicant database.

Analyzing the mental health grant record in Mental Health of all 3 groups

The following is a word cloud of the most common occurring word pairs from the institutional keyword list and the mental health keyword list (both generated by ChatGPT) in the Project Description area of the Philanthrobotics applicant data.

Using a keyword-pair search across all project descriptions, we found some of the most common paired keywords.

In the same vein, we also observed the least likely paired keywords:

Our script also allows us to look at specific keywords. Let’s say we’re interested in mental health education. Which mental health subject areas are most proposed to in the educational area?

Have an inquiry into the types of words you would like to see researched on our corpus of applicant data? Submit an idea here!

UMAP Clustering for Lexical Neighbors

UMAP is a dimensional reduction technique. According to the OSS maintainers, “Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction.” For our use case, this is great because the occurrence of words in a project description can be non-linear in fashion. Mapping out in 2 dimensions our lexical similarities can lead us to interesting pair-finding or group analyses.

Static placeholder for when dash is installed as an embedded object. 0-9 groups represent the similarity of the UMAP function finding words of interest in multiple grant applications. For instance, the dark group here is focused on words around “stress”, “depression”, and “diversity and inclusion”.

Can an LLM Predict the Category of a Grant Application based on its Project Description?

In progress. Delivery by Wednesday, October 18.

By using the following Python code, we can send a prompt to an LLM with simple instructions “classify our grant applicant into a category” (in our case we chose medicine-based project or therapy based project). This is one way program managers can apply LLMs and AI to their every day work to make for faster analysis and use LLMs like virtual assistants.

# Define the classify_abstract function as previously described
def classify_abstract(abstract):
    prompt = f"This abstract describes a philanthropic project: '{abstract}'. Is this an medicine-base project or a therapy-based project? Just say 'medicine-based' or 'descriptive'"
    response = openai.Completion.create(
    return response.choices[0].text.strip()

# Apply the classify_abstract function to the 'description' column and store the result in a new column
mask = df['project_type_guess'].isna()
df.loc[mask, 'project_type_guess'] = df.loc[mask, 'Project/Solution Description'].progress_apply(classify_abstract)

Analyzing the mental health grant record of Lever for Change

In progress.

Analyzing the mental health grant record of MIT Solve

In progress.

Analyzing the mental health grant record of Wellcome Trust

In progress.

1 of 5
No comments here
Why not start the discussion?