KoboToolbox to BigQuery Integration
Connect your KoboToolbox data to enable onward dashboarding and seamless integration with legacy data.
Kobotoolbox offers a brilliant solution for organisations looking to collect data in the field for onwards analysis, even when offline. Popular with humanitarian organisations such as the UN setup is straightforward, access to data is effective and of course the product is free.
The core components of the system are the data collection app. KoboCollect, and the data aggregation service in KoboToolbox which stores, maintains and manages the data and app. integration.
Why connect KoboToolbox to BigQuery?
There are a number of benefits to an onward connection from KoboToolbox to Bigquery. These include:
- Ability to transform and load data alongside historical data for a seamless data set including data collected prior to Kobo Toolbox use.
- Ability to connect BigQuery data to dashboarding (data visualisation) services such as Looker Studio (formerly Data Studio) and PowerBi.
- Ability to generate advanced modelling capabilities against the data such as predictive ML models using the inbuilt capabilities within BigQuery and the broader Google Cloud AI offering (Vertex AI).
- Ownership of data within BigQuery helps protect against any unforeseen access issues to Kobo Toolbox. i.e. helps deliver a robust long term data solution.
Note: If sensitive data is being collected, e.g. personal information, it is important to setup appropriate data hosting and management policies to comply with regional data governance and transfer requirements. e.g. GDPR.
How to make the connection
KoboCollect offers an API which makes automated data extraction straightforward. There are a range of options available in Google cloud for accessing the data through the API and writing to BigQuery including Cloud Functions and Cloud Dataprep. For most organisations with a batch streaming pipeline (e.g. daily load of yesterdays data) Cloud Functions is perhaps the easiest and lowest cost, although it does require creating a suitable function, e.g. with Python.
For a recent project with a US environmental charity we have done just that. We have created a simple Cloud Function which is scheduled on a daily basis to extract the latest data from KoboToolbox, do some basic data processing to transform the data in line with historical data and load into BigQuery to power Looker Studio Dashboards.
In order to schedule the process in GCP a Cloud Scheduler job is setup to fire a pub/sub trigger to the Cloud Function. For small data workloads all of this can be achieved within or close to free usage tiers on GCP.
An advantage of managing this process directly within a Cloud Function is that it makes authentication between the Cloud Function and the BigQuery instance straightforward.
The equivalent process with other cloud data provides such as AWS, e.g. using a Lambda function and Redshift, is very similar.
What is an equivalent solution for businesses? (KoboToolbox versus ODK)
Whilst Kobo is setup and funded for the benefit of charities and not for profits, the principle has plenty of commercial applications. Kobo Toolbox is based on Open Data Kit (ODK), an open data collection service. Where they differ is primarily in the hosting options. Whilst Kobo is setup and funded for humanitarian purposes and therefore the basic server and data aggregator is provided free of charge, for ODK the options are:
- pay a fee for hosting with ODK Cloud from $169 p/m
- self host the open source solution. e.g. within owned servers or a cloud service such as Google cloud (GCP)
Note: Kobo does also offer a premium paid hosting solution for larger organisations and data collection tasks.
We would be delighted to help with setup including form design whether you are looking to adopt either ODK or KoboToolbox.
We are delighted to offer much reduced rates for work and consultancy with not for profit organisations. Contact us to find out more.