With a real focus on our Unified API and open data policy in recent months, both on this blog and through the Hackathons and events (such as the Urban Traffic Hackathon a few weeks ago) that TfL staff have been involved with, we’ve received lots of great feedback and questions from the developer community, just as we’d hoped we would.
One such question that has cropped up many times is one around customer volume and flow data, i.e. how can we help developers create apps that take into account how busy certain lines, stations, platforms, etc are likely to be when customers are planning a journey.
To provide an update on where we are with this data, TfL’s Data Services Manager Ryan Sweeney offers this summary, and asks for your feedback to help us ensure we’re providing data that is both relevant and useful:
TfL have participated in a number of Hackathons this autumn, focused on working with developers to create innovative new solutions that help make our customers’ journeys better. In order to facilitate these events, we’ve released a range of previously unseen data from across our many modes of transport, enabling those working with the data at these events to be truly creative and original in the solutions they develop.
One of the key new data sets we have released is customer volume and flow data across the whole of the TfL rail and bus network, collected from our automatic fare collection system.
The data released is a two week sample from October, and is aggregated to 30 minute level. In addition to aggregation, we have also modified low values for data protection reasons. The data can be found in a zip file available to download here on the TfL website and includes a metadata spreadsheet listing and describing all of the fields:
30 minute entries.csv: This data shows the total number of gateline entries by station, day and 30 minute time interval
30 minute exits.csv: This data shows the total number of gateline exits by station, day and 30 minute time interval
Bus journeys: This data shows the bus boards by stop, route and time of journey.
Journey time.txt: This data shows the journeys made between station pairs and the time taken to make the journey.
We’ve already seen a number of interesting and useful applications and concepts being created using these data sources in the various Hackathons we have attended. We would like to understand, form a dev’s point of view, the answers to the following questions:
- What would you use this data for?
- How granular (time and location) does this data need to be for it to be beneficial? (Please note that some aggregation and modification to remove low values is required in order to openly release ticketing data)
- Is historic data useful?
- Are additional data sources needed to compliment this to enable you to use it?
- Is the data easy to understand?
Please let us know your thoughts on these questions, and if this data promises to be beneficial to the developer community, we will investigate the possibility of making this data available on a periodic basis.