Comparing spatial patterns of crowdsourced and conventional bicycling datasets

Outline of the research

Evidence-based decision-making and provision-planning in transport is fundamentally dependent on the availability of data which can provide a clear understanding of the spatial distribution of supply, demand and utilisation. For active transport (cycling and walking) planning in particular, there is a stark paucity of data related to infrastructure utilisation. The predominant reason for this is that active transport journeys generally tend to occur on spatial scales that are much more finely resolved than journeys undertaken by automobile or public transport. For cycling in particular, conventional data collection usually relies on manual or electronic bicycle counts, travel surveys, and questionnaires; all of these have substantial limitations and therefore consequences for the reliability, resolution, and utility of the subsequent data

By contrast, the crowd-sourced data generated by smartphone cycling apps potentially provide the ability to overcome a number of these issues. Smartphone apps can provide individual journey data at a finer spatial and temporal scales, without relying on the presence of sensors or surveyors. This allows for neighbourhood scale investigations of individual- and group-behaviour with a rich spatial and social context for analysing movement, well beyond the capabilities of conventional cycling data collection. However, the potential drawbacks of crowd-sourced smartphone data are well known: users of the apps don’t necessarily represent an accurate cross section of the whole population due to self-selection bias and access to resources.

As part of her PhD thesis, Dr. Lindsey Conrow and her colleagues Elizabeth Wentz and Trisalyn Nelson at Arizona State University, and Professor Chris Pettit from the City Futures Research Centre at the University of New South Wales, investigated the correspondence between conventional cycling data (Super Tuesday cycling counts) and crowdsourced smartphone cycling app data (using data from the Strava app)[1]. Specifically, they used Local Indicators of Spatial Autocorrelation (LISA) analyses to investigate areas of similarity and dissimilarity between these data types across the Sydney Metropolitan region. Additionally, the authors examined the potential influence of neighbourhood, infrastructure and socio-economic/demographic characteristics on the similarity or dissimilarity between the conventional and crowd-sourced data.

How AURIN was used

Dr. Conrow and her colleagues used a range of datasets from the AURIN workbench to enable this research. This included the Super Tuesday Bike Census datasets, available at point level format national-wide. Statistical Area Level 2 datasets around income, population, socio-economic characteristics, produced by the Australian Bureau of Statistics (ABS Data by Region, ABS SEIFA) datasets were also obtained by the researchers, along with the Local Environment Plans for New South Wales.

Dr Conrow’s use of LISA analyses in this context represents a novel application of these kinds methods. Her research team undertook these spatial autocorrelation analyses using the PySAL python libraries, available for free download from the Arizona State University here. However, users of the AURIN Portal can also undertake these and a broad range of other spatial statistical analyses using the Spatial Statistics toolbox within the portal.

Spatial distribution of rank differences between data types (left) and location and typology of significant rank differences (right). From Conrow et al. 2018

AURIN provided convenient access to trusted data that not only helped me examine the research questions for my PhD research, but also improved the overall efficiency of the work. This analysis involved several different types of data generated by varying agencies, which typically requires the researcher to track down reliable sources and delayed responses through long email chains. Having everything available through AURIN allowed me to focus my time on methods and analysis instead of the data hunt.

– Dr Lindsey Conrow

Findings and Impacts of the Research

The analyses found a relatively strong correlation between the conventional and crowdsourced datasets in terms of reporting ridership overall. However, the researchers also reported important spatial and social-demographic distinctions. Sites identified by the LISA analyses as having statistically significant similarity between the datasets were clustered in Parramatta. These sites had low ridership in both datasets, less cycling infrastructure, high rates of using either public transport or private motor vehicles for the journey to work, and a greater prevalence of socio-economic disadvantage. The researchers’ use of LISA analyses also identified a number of sites of dissimilarity around Greater Sydney, where there were substantial discrepancies in ridership reported by the Super Tuesday Cycling Census data and the Strava crowdsourced data. These sites were clustered predominantly in the northern suburbs of Greater Sydney, and all of them had higher proportional ridership in the Strava dataset. The lack of cycling infrastructure to support daily commuting cycling activity in these areas suggests that these results are the outcome of this area’s popularity as a recreational cycling location for Strava users.

Dr. Conrow and her colleagues showed that crowd sourced and conventional datasets, while correlating highly overall, still have fundamentally different narratives to contribute to the planning and investment cycle for cycling infrastructure development. Using crowd-sourced data in isolation to plan and invest in cycling infrastructure could lead to facilities that favour only those cyclists who use the cycling apps, rather than all current and potential riders in a region. In particular, when striving for mode shift and encouraging the use of cycling as a viable, safe, and sustainable transport option, understanding the impact of infrastructure, neighbourhood, land-use and socio-demographic factors underlying the relationship between conventional and crowdsourced data is crucial

[1] Conrow, L., Wentz, E., Nelson, T., & Pettit, C. (2018). Comparing spatial patterns of crowdsourced and conventional bicycling datasets. Applied Geography92, 21-30