Natural Language Processing Project Takes Shape
July 15, 2019
Over the course of the 2018-2019 academic year, a new project took shape at the Dahdaleh Institute. Led by PhD. Candidate Tino Kreutzer, Improving Humanitarian Needs Assessments Through Natural Language Processing brings Artificial Intelligence (AI) to humanitarian settings.
Current approaches to assessing humanitarian needs through surveys often require interviewers to convert complex, open-ended responses into simplified categorical data, but the capacity to do so effectively is limited. Natural Language Processing, a form of AI, provides potentially far-reaching new opportunities to capture qualitative data from voice responses and analyze it for relevant content to better inform humanitarian assistance decisions.
The Dahdaleh Institute recruited a computer science student with experience in machine learning to systematically assess the quality of various transcription and translation tools, focusing initially on English and Arabic.
The Dahdaleh Institute hosted the first International Health Emergency Data Science Workshop. The International Rescue Committee (IRC) presented one of five challenges at the event: due to lack of human and financial resources, it is challenging to integrate qualitative methods in humanitarian surveys and feedback collection methods. During an all-day design and feasibility session, the workshop explored if and how NLP can be used to automate transcription, translation, and analysis of open-ended responses. The working group comprised specialists from six countries and 10 organizations, including ACAPS, Elrha, Harvard Humanitarian Initiative, IRC, NetHope, OCHA, Pivotal, Purple Compass, World Food Programme (WFP), and York University.
Soon after the event, the Dahdaleh Institute convened a smaller working group for an all-day discussion to better define the challenges faced by multiple organizations, prioritize next steps, and discuss piloting options.
The Dahdaleh Institute began working to identify a pilot project site which would require building a transcription and translation model from scratch during an ongoing complex emergency.
Several York researchers, together with multiple partners, began describing the challenge and the proposed solution in a formal article, which will be published in a peer-reviewed journal in early 2020.
Project lead Tino Kreutzer presented project plans and early results on the ethical challenges at the International Studies Association and Ethics and Humanitarian Research conferences.
The Dahdaleh Institute team began working more closely with a partner organization aiming to create NLP models for machine translation in low-resource languages.
To support this work, the Dahdaleh Institute recruited a recent computer science graduate to create a working model for classifying open-ended responses to survey questions.
Work has continued through the summer to build partnerships and work towards piloting the project.