Introduction
Recently we concluded our 2024 Data For Good Cohort, where we welcomed 24 new volunteers into our community. As part of this cohort program the volunteers have a number of learning and development seminars, in-person networking events, but without a doubt the highlight is always our Hackathon. The hackathon is the perfect opportunity for our cohort members to get a better understanding of how a GDI project works: including communicating with a charity, understanding their unique challenges, and finally building solutions that solve the charities needs.
In 2024 we were lucky to partner with Ersilia, a nonprofit organisation dedicated to making drug discovery more accessible. This was one of our best hackathons to date, and in the below blog piece we will share a bit about the journey.
Introducing Ersilia?
Ersilia is at the forefront of open-source drug discovery, providing computational tools to lower the barriers to finding new treatments for infectious diseases. They focus on neglected tropical diseases that disproportionately affect the global south, a region where the global burden of disease and research outputs are starkly imbalanced. With less than 5% of scientific publications originating from low-income countries (LICs) and infectious diseases accounting for up to 60% of total deaths in these regions, Ersilia's work is crucial. They offer low-code tools like the Ersilia Model Hub, ZairaChem, and ChemSampler, housing over 150 models to aid scientists in resource-constrained settings.
The Hackathon Challenge: Innovating Impact Metrics
For this hackathon, GDI cohort volunteers worked alongside Ersilia to develop new "Impact Metrics" to better measure and showcase the organisation's achievements. The challenge was three-fold:
Define and develop ideas for impactful metrics that Ersilia can utilise.
Automate the collection of these metrics, streamlining data gathering processes.
Present the metrics in an innovative and engaging data product.
Currently, Ersilia relies on a semi-automated collection of data, using spreadsheets to compile data such as publications, code repositories, and funding information. This hackathon aimed to explore automated solutions for these processes, enhancing the clarity and accessibility of Ersilia's impact reporting.
Judging Criteria: Evaluating Innovation and Relevance
The teams' solutions were evaluated based on the following criteria:
Practical/Actionable Insights (40%): Emphasis on clear, actionable insights that address relevant considerations with rigour and thoughtfulness.
Relevance to the Problem Statement (40%): Understanding and addressing the client's issues, with impactful insights tailored to Ersilia's context.
Final Presentation (10%): Clarity, structure, and engagement in presenting the solutions.
Creativity/Originality (10%): Innovative approaches and creative thinking in solving the challenge.
The Team Solutions
Below the teams give a summary of the solution that they prepared for Ersilia, which shows just how varied and creative each team was in tackling this challenge.
Team 1 Solution
Caitlin Westney (C), Kevin Beeson, Yashodhya Wijesinghe, Sash Gowrieswaran, Praneeth Saturi, Victor Zhen Zhuang
Inspired by Ersilia’s focus on three United Nations Sustainable Development Goals, Group 1 structured its approach and dashboard around the following: Good Health and Wellbeing (UNSDG 3), Reduced Inequalities (UNSDG 10) and Partnerships for the Goals (UNSDG 17).
The first dashboard tab helps Ersilia track progress towards good health and wellbeing by comparing Ersilia’s website traffic with global Disability Adjusted Life Years (DALYs) data from WHO. The data can be filtered by location, disease tag, and model, allowing Ersilia to assess its impact, relative to global needs.
The second tab addresses global research and funding inequalities, guiding team members to focus on areas where they can reduce equality gaps. It uses DALYs data to highlight disease prevalence, along with OpenAlex data (collected via an API) on citation countries and UNSDG topics of Ersilia’s research. Filters by region help Ersilia compare where their research is used versus where it’s needed. This tab also includes two funding tools: a link to G-Finder, an external dashboard tracking global funding by disease and country, and a live Google Alerts RSS feed to help streamline research on funding opportunities.
The third tab emphasises Ersilia’s outreach, focusing on community and partnerships. It visualises newsletter and research publications metrics. For publications, team members can see topic breakdowns and relevant research details. The outreach section includes a word cloud of newsletter topics and a detailed content breakdown. They also visualised newsletter clicks, opens, and recipients, helping Ersilia track performance and identify content that resonates with its community.
Team 2 Solution
Pradeep Jones (C), Tania Sadhani, Ojas Pradhan, Devika Prasad, Zoren Liu, Ankita Singh
In their presentation, Team 2 outlined a data-driven approach to amplifying Ersilia's global health impact by developing an impact metric system. They followed a four-step approach: (1) Defining impact areas, (2) Identifying Key Stakeholders, (3) Formulating key questions for impact metrics, and (4) Selecting and implementing these metrics on a Looker Studio Dashboard. Team 2 focused on two core impact areas: research influence and community growth. They measured research influence through publications, citations, and model downloads, while community growth was evaluated using metrics such as the number of affiliated organizations, events, and digital content reach. Their goal was to create a comprehensive framework for Ersilia to track and showcase its impact effectively.
To deliver the solution, Team 2 utilized a toolkit that included Looker Studio, Python, R, GitHub Actions, and web scraping. They used GitHub Actions and a Python script with the unofficial Medium API to track Medium article publications, Docker Hub data to monitor model usage, and an R package to scrape citation data from Google Scholar. Mailchimp data was also used to assess community engagement through newsletters. These tools enabled Team 2 to effectively measure research influence and community growth, providing Ersilia with valuable insights for showcasing its impact and guiding future strategies.
Team 3 Solution
Stephanie McDonald (C), Tina Mu, Nam Hung Ngu, Callum Lee, James Baker, Keshav Asaithambi
In their solution, Team 3 aimed to create a dashboard that not only showcased Ersilia's current impact but also highlighted their potential for even greater future impact. Their goal was to assist Ersilia in identifying where to focus their energy and resources, using data to tell a story about who Ersilia is and why their work matters. Team 3 believed this approach would allow Ersilia to present a more complete picture of their motivation and vision, as well as their impact on sustainable drug discovery in the Global South.
Team 3 began their dashboard by setting the context with the impact of a single disease, Malaria. During their exploration, they discovered the Python package "owid-catalog" maintained by Our World in Data. This package provided easy access to a wide range of data beautifully displayed on the Our World in Data website. Team 3 created a script to read, format, and save this data into a Google Sheet, which was then accessed via their Looker Studio dashboard.
After establishing the context, Team 3 broke down Ersilia’s impact into three key areas: their models and usage, their publications and citations, and their community growth and events. Using data from Ersilia and OpenAlex, they identified where Ersilia was making an impact globally, particularly in the Global South. They aligned the metrics with the UN’s Sustainable Development Goals to clearly demonstrate Ersilia’s contribution to building a better future. Additionally, Team 3 focused on Ersilia’s growth over the years to emphasize that their impact is continuously increasing. They also provided recommendations on data Ersilia could start tracking to further enhance the visualization and presentation of their impact.
Team 4 Solution
Harshil Shah (C), Helen Liang, Uesi Unasa, Olga Shabalina, Haley Hutchins, Lara Najim
In their project, Team 4 aimed to simplify Ersilia’s grant application process and support their mission of advancing drug discovery for tropical diseases in the Global South. They focused on three areas to develop impact metrics: Influence/Reach, Weekly Model Usage, and Grant Application Tracking. To automate data collection and presentation, they built a dashboard using Looker Studio and Google Sheets.
Team 4 utilised data from Ersilia's Medium posts, volunteer network, and publications to create a dashboard highlighting their global influence. Key metrics included Medium post views, volunteer numbers by organisation, and publication views and citations by disease area.
To track Ersilia’s model usage, the team automated the recording of weekly model pulls from Docker Hub using a Python-based GitHub Action. This allowed them to display total weekly pulls, top-performing models, and usage by disease area, helping Ersilia monitor trends over time.
Team 4 also developed a grant application tracking system, showcasing metrics such as success rates and quarterly funding outcomes. Additionally, they automated the search for potential grants, providing Ersilia with a streamlined process to identify new funding opportunities.
The Outcome: Celebrating Success and Innovation
The hackathon concluded with a presentation from each team back to our judging panel. Needless to say we were absolutely blown away by the range of innovative solutions, demonstrating the creativity and skills of our new volunteers.
In the end, there could only be one winner - and after a very comprehensive judging process there was only ½ a point that separated the teams (by far our closest in years).
The winning team was Team 1 made up of Caitlin Westney, Kevin Beeson, Yashodhya Wijesinghe, Sash Gowrieswaran, Praneeth Saturi, Victor Zhen Zhuang, and Shivanka as their mentor. They impressed the judges with the way they went beyond internal data, collecting data from external sources such as OpenAlex, G-Finder, WHO or Our World in Data.
One of the most positive outcomes was that Ersillia were interested in implementing a number of the solutions that the team presented, going to show that our volunteers are more than ready to jump on projects.
"For a small non-profit like ours, demonstrating impact as early as possible is key. All four participant teams have truly provided remarkable insights into our organisation, coming up with creative ways of telling our story via dashboards, automated data collection, and much more" - Miquel
A Massive Thank You
The team at GDI would like to share a massive thank you to a number of people for ensuring the success of this hackathon.
To our incredible charity partner Ersilia for being super engaged, thoughtful and encouraging throughout the entire process. To our new volunteers, the dedication and hard work put into this hackathon was very clearly seen and much appreciated.
And finally to our internal GDI team for helping organise the operational aspects of the hackathon. To our Fellows Alistair, Joe, Shivanka, Dave and Eleanor for supporting the teams along the way. And a massive thanks to Berta for leading from the start to the end, you did an incredible job!
By Andrew Smith (Director of Communications)
About GDI:
The Good Data Institute (established 2019) is a registered not-for-profit organisation (ABN: 6664087941) that aims to give not-for-profits access to data analytics (D&A) support & tools. Our mission is to be the bridge between the not-for-profit world and the world of data analytics practitioners wishing to do social good. Using D&A, we identify, share, and help implement the most effective means for growing NFP people, organisations, and their impact.
Kommentare