Jointly tackling the challenge of unstructured data to improve humanitarian response
October 2018 – There is no such thing as too much information, right? Wrong! Sometimes, the large amount of information can create a lot of noise, which affects the ability of humanitarians to identify key characteristics of the situation in a crisis. A system to easily extract, process and interpret information meaningfully from all data available is therefore a priority for the sector and can contribute to faster, more informed decisions-making in crisis situations.
This is where the DEEP project comes into play (more about this below). At the recent Port Hackathon, hosted at the CERN IdeaSquare, the IFRC IM team, with colleagues from the Joint IDP Profiling Service (JIPS), took on the challenge together with a team of 11 hackers to further enhance the platform’s capacities, using optical character recognition and machine learning elements. The event saw more than 60 professionals from all over the world come together to “build working prototypes and tangible solutions for real-life humanitarian problems”.
The DEEP project & the Hackathon challenge
For more informed decisions and rapid response to humanitarian crises, we need to answer three key questions: Who is affected, where it is happening, and how bad it is. This is where the DEEP platform comes in. Initiated in the aftermath of the Nepal earthquake in 2015, the DEEP supports users by:
- allowing the structured review of a large number of documents,
- identifying key information, and
- extracting relevant data for prompt analysis.
The project supports collaboration and inter-agency analysis and is a concrete response to the Grand Bargain. As well as IFRC, DEEP governance members include ACAPS, IDMC, OCHA, UNHCR, OHCHR, UNICEF and JIPS.
During the four weeks of preparation and 60 hours of final hackathon, a team of 11 hackers, calling themselves the ‘DEEPER’ team worked together to understand how the platform can help address four specific challenges that analysts face in the midst of humanitarian crisis:
- Within the overwhelming amount of information, what is new about a specific piece of information?
- How can we get an overview of what entities, such as people and places, are mentioned in documents?
- At a glance and without having to read through the full text, how can we understand what a specific document is about?
- How bad is the situation? What do the documents tell me in terms of numbers?
Benefitting from the rich variety and different background and expertise of its members, the team was able to produce four concrete outputs during the hackathon:
1. Creation of a similarity score for new documents
As an analyst working with a large number of documents ranging from PDFs to web articles, it is important to know if a new piece of information contains new insights or if it is very similar to existing documents. The team was able to setup a mechanism by which new documents could be scored for their uniqueness against all other available data sources in an analyst’s library.
2. Improved extraction of entities in documents
The process of attempting to identify what geographic locations are mentioned in a document can be time-consuming as it requires manually reading through a number of pages, or looking for words that may resemble the name of a location. The DEEPER team developed a technique to automatically extract and visualize names of places, people, events, organisations and a number of other types of entities from reviewed documents.
3. Enhanced automatic tagging functions
Analysts need to know what a document is about. Is it relevant to food, health, or logistics needs, or another topic that they need to be aware of? By applying machine learning methods, the DEEPER team was able to improve the overall precision of the suggestions for tagging by to 71%.
4. New models to extract tabular data from PDFs
To improve analysis in humanitarian crises we need to be able to extract numerical data from tables and graphs included in documents, that are most commonly found in PDF format. However, while the human brain may easily understand at a glance the essence of a table, for a computer this is much more difficult. The team worked on enhancing functionalities of the DEEP platform to capture and extract tabular data from PDF documents, potentially resolving this very real challenge.
The solutions developed by the team are all open source and available for use by other humanitarian actors and platforms, not just limited to powering DEEP. All code developed in the challenge is available on GitHub, as is the code for the core DEEP platform.
Going DEEP to understand emergency needs
The IFRC supports the DEEP in order to improve our ability to rapidly structure and analyse secondary data, adding to our understanding and analysis of emergency needs and helping us to choose the right response. As part of a wider effort to revise and improve our emergency needs assessment approach, we are investigating how technologies such as DEEP can support analysis. We have put together a video to present the approach that IFRC has developed as part of its surge optimization process, aiming to inform Emergency Plans of Action with strong, timely and reliable evidence on assessed needs and capacities, especially in case of major scale disasters:
One of the things which excites us most about the DEEP is that it opens up the potential for collaboration between analysts between organisations. While the Red Cross Red Crescent network has unparalleled global coverage and access to communities affected by disasters, we also realise that our analysis is strengthened if we exchange our knowledge with other organisations that have specialist expertise and functions. This partnership approach is behind our exchanging participants and facilitators on analysis courses with ACAPS and OCHA over the past year, as well as our contributing to efforts to build common analytical frameworks through the Joint Inter-Agency Group (JIAG). We are keen to get others involved in this shared journey and recently co-hosted a session with JIPS at the GeONG conference in Chambery aiming to do just that. This is because we realise that the DEEP can only improve our analysis if we share our collective brain power, meaning we need to also hack our organisations! Now, there’s an idea for next year’s hackathon…
The content has been adapted with permission from an original blog released by JIPS. Co-authors: Wilhelmina Welsch, (JIPS) and Luke Caley (IFRC).
(photo credit: The Port hackathon, used with permission, October 2018)