A collection of running meeting notes among collaborators at KFG and VUB.
Monday, December 19, 2022
Meeting between Kevin / Carmen
Determining the PubPub presence we would like to have:
This document will be used for running updates of our meetings (Kevin will consolidate previous meeting notes here)
Research notes:
LUCID applied to our research question; blog post on a more layman’s terms of what steps we take procedurally
Write an explanation about the LUCID results (living results, probably no access to Collab just yet)
Line-by-line code breakdown of the model
Writing about the VUB grants data collected (Vincent should have an update in January)
Data-sharing agreements (including VUB’s data-sharing policies) in January
Mirror bias modeling: solicitations from SOLVE about various challenges to model language from grant applications
Thursday, January 5, 2023
Added some information text to the data; the final column is “Decision” and it is dictated by the following excel formula:
IF(SO("Published","Selected","Finalist","Semi-Finalist"),"YES","NO")
Debugging of code in Colab notebook
Possible next steps?
Error in code:
Line #110663 (got 1 columns instead of 15)
Line #110664 (got 1 columns instead of 15)
Line #110665 (got 1 columns instead of 15)
Line #110666 (got 5 columns instead of 15)
Line #110667 (got 1 columns instead of 15)
This had to do with the Python render of the CSV. I solved this by just applying a different CSV encoding (not UTF-8) to the CSV
Some Type Errors occurred:
Results from the running script:
Data count: 9704
Training data count: 7278
Test data count: 2426
Work on a LUCID introduction for LUCID application to grant application data (draft to a Pub here)
Splitting up of tasks:
Kevin: describing the data (I will have to talk to SOLVE about this and see how I might do this in a Pub)
Carmen:
Check up on VUB grants data collected by Vincent
Data sharing agreements from VUB have been uploaded to the data sharing agreements repo here!
Action items:
Start off with all data together, and then do individual calls
Do not include drafts in the first run
Carmen, Kevin: NoneType issue debugging, looking at the person_inputs
variable and what that should
Kevin: Update the metadata to be more specific around Drafts
Carmen: Update the structure to the LUCID Pub
Thursday, February 23, 2023
Trained the initial model. Here are the steps Kevin was about to reach in the shared notebook.
Encoding of data
Training of the initial model
Challenges: it seems like the data might be encoded incorrectly, or do we need to add something to the model?
Kevin updated some data descriptions in the post stub. Will finish that later today!
Kevin (for next time): Data is encoding all 0s, this does seem right; Try 10 data entry points and print out the results as we encode
Carmen: Start expanding the intro of the blogpost.
Monday, March 13, 2023
Encoding of all 0s is still an issue with the dataset; slight progress made here
Kevin thinks that there are still columns that are being encoded as 0s even though they should be 1s because the training accuracy remains low
Kevin will share the anonymized data sample with Carmen to double the efforts
Blogpost
Data description is done, need to pass by Solve first before adding column specificity to the blog post
Carmen added an introduction to the blogpost
VUB Meeting times conflict with our schedule: new meeting times: Tuesday 9am ET every two weeks
Tuesday 21 March
Code
Both will continue to work on the encoding of the data. We feel we are close!
Blogpost
Kevin has added the data description!
Carmen will add some more text about what the problem is we’re trying to solve.
Thursday 6 April
Code
Problems with the code have been fixed! Yay!
One caveat is having to pare down my current dataset
Kevin to go back and enrich the data
Meeting with Rebecca from Solve to get admin for data, will get enrichments
Q: how might we account for multi-select fields?
Carmen will see how to update the code to allow for this. As we have one-hot encoding this should normally work.
Q: What is the sample_date variable for LUCID?
Used to determine format of input data points. Carmen will discuss with her colleague if it is possible to just give LUCID the dimension of the input space.
Blogpost
Q: Should we use references in the blog post or not?
Good practise
Next steps
Cleaning script start for Kevin
Enriched data from Solve (ask about 2023)
2023 data changes in metadata
Kevin will get the application field compendium
Carmen to continue on coding and code interpretation