Sign up to our weekly newsletter, RAIL Briefing

Data explosion fuels rail policy innovations

Read the peer reviews for this feature.

Download the graphs for this feature.

It is said that the ‘Railway King’ George Hudson once insisted: “I will have no statistics on my railway!” We have come a long way since anyone could express such sentiments and not be thought a fool - today it is data, as statistics are now termed, that can be called king.

In the 19th century, successive Railway Regulation Acts increased the levels of statistical information required from railway companies by the Board of Trade. To supply this, companies created statistical departments, although the North Eastern Railway was the only British railway company to approach the sophistication of US railroads in its collection and analysis of data.

The tough, competitive conditions of the inter-war years forced railway companies to adopt more rigorous methods, prior to nationalisation in 1948. Today, as Jonathan Raper of TransportAPI says: “Data is at the heart of public transport. Releasing it and ensuring that it is used to the best effect in regulation, and to create new value and new products and services, is what we want to do. More or less everyone agrees with that now.”

Our ability to collect this data, often as part of systems designed primarily for other purposes, has grown exponentially in a very short time. The cost of collection has correspondingly fallen, although the challenge of turning information into useful knowledge remains substantial - it took three years of iteration between Transport for London and MIT to develop an algorithm for a particular project.

Broadly speaking, three types of information are collected relating to the railway industry.

Firstly, there is a host of quantitative data about train running, timetables, infrastructure, track, and safety issues such as SPADs - inevitably most of this data is sourced and made available to the public by Network Rail under an Open Government Licence. NR data feeds became available in December 2012, allowing website and app developers to create information sources of great value to passengers.

The NR website has a link to ‘data feeds’ and the sites that have been created using its data. For example, the third-party website Live PPM (Public Performance Measure) displays real-time performance information for each TOC, and receives more than ten million hits a month. Open Train Times has track diagrams showing the real-time position of each train in signal sections, and users can see the train arrival and departure times at stations. Trains.im has bar charts showing instantly which TOCs are having a good or bad day.

These developments are of value not only to rail customers, but also to the operators. As Peter Hicks, developer of Open Train Times, explains: “It’s popular with staff who use it in an advisory capacity. Platform staff use it to find out where trains are, I’ve had emails from signallers who want to see what’s happening at work when they’re off-shift, and I’ve been told that the British Transport Police has even used it when planning incident responses.” By releasing the data, NR has gained tools of value to both itself and the TOCs… at zero cost.

TOCs themselves generate data on timetables, fares and the routes over which fares are valid, and the Rail Settlement Plan (today’s Railway Clearing House) within ATOC is  “authorised to make this data available under licence to third parties on behalf of the train companies, in order to promote rail travel and encourage the wider distribution of accurate and consistent rail travel information on an impartial basis”. Safeguards are put in place under the terms & conditions to which users must sign up, to protect the integrity of the data and to ensure that it is used accurately.

Following changes announced by the Rail Delivery Group in early 2014, a new online registration platform was launched in October giving developers automated access to the National Rail Enquiries (NRE) Darwin Webservice, making it even easier for people and organisations to use live train running information in their apps, websites and other services.

Darwin is the system paid for by train operators which analyses raw data from numerous rail industry sources to predict train arrival and departure times. This is part of the work of an RDG work stream headed by David Brown at Go-Ahead to extend openness and transparency in the rail industry.

ATOC Commercial Director David Mapp is leading on a joint research project with the Rail Safety & Standards Board to examine customer preferences such as the mode of transport to stations and catering preferences. The research is looking at the journey experience and ways in which TOCs and third party retailers such as thetrainline.com and redspottedhanky.com can use data to encourage additional trips through email and phone marketing.

The Rail Delivery Group has a dedicated transparency work stream, aimed at extending openness in the rail industry by making relevant data and information more accessible to rail users and other stakeholders.

On a broader level, the Office of Rail Regulation publishes statistical releases on eight themes: passenger and freight rail performance; passenger rail usage; freight rail usage; passenger rail service satisfaction; regional rail usage; key safety statistics; rail finance; and rail infrastructure, assets and environment. These are mostly self-explanatory, although the service satisfaction deals specifically with customer complaints.