Skip to main content
SearchLoginLogin or Signup

Updates on VUB Collaboration

Published onApr 11, 2023
Updates on VUB Collaboration

A collection of running meeting notes among collaborators at KFG and VUB.


Monday, December 19, 2022

  • Meeting between Kevin / Carmen

  • Determining the PubPub presence we would like to have:

    1. This document will be used for running updates of our meetings (Kevin will consolidate previous meeting notes here)

    2. Research notes:

      • LUCID applied to our research question; blog post on a more layman’s terms of what steps we take procedurally

      • Write an explanation about the LUCID results (living results, probably no access to Collab just yet)

      • Line-by-line code breakdown of the model

    3. Writing about the VUB grants data collected (Vincent should have an update in January)

    4. Data-sharing agreements (including VUB’s data-sharing policies) in January

    5. Mirror bias modeling: solicitations from SOLVE about various challenges to model language from grant applications


Thursday, January 5, 2023

  1. Added some information text to the data; the final column is “Decision” and it is dictated by the following excel formula:

    IF(SO("Published","Selected","Finalist","Semi-Finalist"),"YES","NO")
  2. Debugging of code in Colab notebook

    • Possible next steps?

    • Error in code:

      Line #110663 (got 1 columns instead of 15)
          Line #110664 (got 1 columns instead of 15)
          Line #110665 (got 1 columns instead of 15)
          Line #110666 (got 5 columns instead of 15)
          Line #110667 (got 1 columns instead of 15)
    • This had to do with the Python render of the CSV. I solved this by just applying a different CSV encoding (not UTF-8) to the CSV

    • Some Type Errors occurred:

    • Results from the running script:

      Data count: 9704
      Training data count: 7278
      Test data count: 2426
  3. Work on a LUCID introduction for LUCID application to grant application data (draft to a Pub here)

    • Splitting up of tasks:

      • Kevin: describing the data (I will have to talk to SOLVE about this and see how I might do this in a Pub)

      • Carmen:

  4. Check up on VUB grants data collected by Vincent

  5. Data sharing agreements from VUB have been uploaded to the data sharing agreements repo here!

  6. Action items:

    • Start off with all data together, and then do individual calls

    • Do not include drafts in the first run

    • Carmen, Kevin: NoneType issue debugging, looking at the person_inputs variable and what that should

    • Kevin: Update the metadata to be more specific around Drafts

    • Carmen: Update the structure to the LUCID Pub


Thursday, February 23, 2023

  1. Trained the initial model. Here are the steps Kevin was about to reach in the shared notebook.

    1. Encoding of data

    2. Training of the initial model

    3. Challenges: it seems like the data might be encoded incorrectly, or do we need to add something to the model?

  2. Kevin updated some data descriptions in the post stub. Will finish that later today!

  3. Kevin (for next time): Data is encoding all 0s, this does seem right; Try 10 data entry points and print out the results as we encode

  4. Carmen: Start expanding the intro of the blogpost.


Monday, March 13, 2023

  1. Encoding of all 0s is still an issue with the dataset; slight progress made here

    1. Kevin thinks that there are still columns that are being encoded as 0s even though they should be 1s because the training accuracy remains low

    2. Kevin will share the anonymized data sample with Carmen to double the efforts

  2. Blogpost

    1. Data description is done, need to pass by Solve first before adding column specificity to the blog post

    2. Carmen added an introduction to the blogpost

  3. VUB Meeting times conflict with our schedule: new meeting times: Tuesday 9am ET every two weeks


Tuesday 21 March

  1. Code

    • Both will continue to work on the encoding of the data. We feel we are close!

  2. Blogpost

    1. Kevin has added the data description!

    2. Carmen will add some more text about what the problem is we’re trying to solve.


Thursday 6 April

  1. Code

    • Problems with the code have been fixed! Yay!

      • One caveat is having to pare down my current dataset

      • Kevin to go back and enrich the data

        • Meeting with Rebecca from Solve to get admin for data, will get enrichments

    • Q: how might we account for multi-select fields?

      • Carmen will see how to update the code to allow for this. As we have one-hot encoding this should normally work.

    • Q: What is the sample_date variable for LUCID?

      • Used to determine format of input data points. Carmen will discuss with her colleague if it is possible to just give LUCID the dimension of the input space.

  2. Blogpost

    1. Q: Should we use references in the blog post or not?

      1. Good practise

  3. Next steps

    • Cleaning script start for Kevin

    • Enriched data from Solve (ask about 2023)

      • 2023 data changes in metadata

    • Kevin will get the application field compendium

    • Carmen to continue on coding and code interpretation

Comments
0
comment
No comments here
Why not start the discussion?