[ed. note. Cross-posted – “Data Collection Basics” by Fanor Camacho, Humanitarian Data Specialist/Information Management Delegate, IFRC Americas Regional Office.]
The data collection is a critical point for the data cycle process, excellent data quality will allow an excellent analysis. For this purpose, it is necessary to consider several aspects which are related to each other, these aspects are described here below:
It refers to the internal capacity of the team that collects the data, it is necessary to identify what kind of competences we can find internally, if the members have received any training in data collection, some members could have programming skills, others analysis skills, communication or leadership. These aspects should be considered to maximize the benefit in each of the tasks in the process.
It is necessary to consider the number of people involved in the data collection process and the time that our human resources can devote to this work. Considering having more time available we could obtain more data and measure some other variables that can boost our analysis.
The choice of the type of application to data collection is directly related to factors such as: the installed capacity in the team, the technological resource available, the budget, the context. There are multiple applications that allow data collection, starting from paper surveys to the use of electronic devices for this task.
The choice in technology is critical for the d
ata collection, what is important at this point is to think about how the work can be carried out in the most efficient way, optimizing collection and the time for the analysis, information quality and sustainability.
The cost of data collection is a parameter that must be evaluated before carrying out the action, probably resources have been allocated to the purchase of equipment, training, it is also important to consider costs related to software licenses, data storage, human resources, indirect costs of the process such as transportation, per diem, snacks.
Volume of data:
The amount of information expected to be collected must be considered in advance, in this way the type of technology to be used will be properly selected, it is important to calculate the costs inherent to the process such as storage and to evaluate the scalability which means the implications in case the data collection system receives more data.
The characterization of data will allow maximizing the use of tools and foresee the type of technology to
be used. It is not the same, to think about collecting only numeric data to collect geolocation, photographs, opinions. Some types of data are:
Identifier: this type of data is used to identify an element, it is generally the first data collected from each survey and is commonly identified by the acronym ID or a bar code.
Integer and real numbers: Refers to numeric digits, are generally used to make mathematical and statistical calculations.
Text values: are used to capture text type information, can be limited text for questions with pre-established answers or not limited to capture opinions, this data can be processed by grouping methods, filters, counting etc. as long as the systematization is done correctly.
Location data: This data allows to obtain geographic coordinates of the place where the data has been taken, this data is generally processed through geographic information systems and allows obtaining a spatial view of the objects.
Photographic data: the photographic record is vital in monitoring actions. These photographs are proof of the object of study and allow revisiting the evaluation in a visual way
The systematization of data is a process after the collection , it is an aspect that needs to be considered in advance. The data collection should be seen as part of a cycle and not as an isolated activity, thinking about the systematization in advance will allow evaluating aspects such as the choice in technology, processing time, type of information, quality of information.
Sector of intervention:
This is another decisive factor in the process of data collection, depending on the context of the intervention, actions should be chosen according to the reality of the situation. Knowing the context in which we work we can opt for different types of technologies, in certain cases the places where the intervention is happening can present situations such as: social problems, difficult access, rugged terrains, resistance to the use of certain technology, low connectivity to the communication systems. All these factors must be studied in advance in order to ensure efficient information collection systems.
Probably the best solutions for data collection are not always the most sustainable, this will depend on the installed capacity and the budget.
There are many open source applications that can be used freely to collect information, this implies some technical difficulty for implementation if you do not have enough training in the use of these tools, which can have a negative impact on the process of data collection, generating frustration and often lack of continuity in case of looking for solutions that can be replicated or maintained in the long term envision. For this reason, it is important to find suitable solutions to the installed capacity, taking into account what are the skills present in the work team and if these capabilities will last.
Another important factor is the available budget, currently the proliferation of solutions for data collection is increasing, many companies offers solutions with an exceptional performance, however, it is essential to evaluate the cost of these tools and the possibility of continuing to use them in the long term.
To conclude, the data collection process should be considered as part of the information management system, Composed by the following steps: data collection, storage, processing, visualization, dissemination and report, any decision made in this step will have an impact on the subsequent processes, hence the importance of making a detailed study about the method to be used.
Another important fact to highlight is that none information system is perfect for any context, considering all the variables mentioned. This process of data collection may require a first test to assess whether it meets the necessary conditions and is the one that best suits the needs indicated. The innovative component is essential to find solutions adapted to specific needs, possibly interoperability and the integration of several solutions is the most appropriate solution.
Credits: Blog post by Fanor Camacho, Icon Data map by Viktor Vorobyev (Noun Project) CCBY 4.0.