Data Science at the Home Office

Data Science at the Home Office

The opening paper at this year’s Annual Analytics Summit, ‘Data Science at the Home Office’, was presented by Rupert Chaplin, Head of Data Science at the Home Office. The Home Office employs some 30,000 people, has an annual budget of around £11bn and is supported by a team of about 30 data scientists.

The big ‘operational delivery arms’ of the Home Office cover crime, policing, fire, visas, immigration enforcement, border force, passport control and terrorism. The department has a dedicated network and has access to the software and tools it needs. It has also done a lot of investment on the data engineering side to make sure the data is there and ready to be analysed. It provides a close link between the professional analysts and the digital data technologists – the architects and the people who are responsible for maintaining the systems.

One of the tools used is IBM’s InfoSphere, which is used in the processes involved in standardising data from systems and cleaning up data outputs for analysis. InfoSphere allows ‘bucketing’ of different entities where there might be an element of similarity.

Data Science at the Home Office

Rupert Chaplin

The department uses a probabilistic method that takes into account typographical errors, alternating names, etc. This is used across all of the identifiers. A search for an individual will provide scores based on each of the possible identifiers which then produces a consolidated score for a person. Some of the processes used were extremely computationally intensive so it was necessary to leverage and scale it for appropriate usage.

To illustrate how the department utilised data Chaplin cited two case studies. The first of these concerned visa processing. There were about 3 million visa applications last year, some simple and some complex. They have a highly effective model that provides insight for making decisions regarding the granting or refusal of visa applications. He did stress though, that the model used would not automatically grant or reject any visa application, the final outcome always had to be made by a case worker.

The second case study concerned data science and its application at the border. Analytics has proved particularly useful in dealing with the millions of entries into the UK each year. Last year alone there were around 140 million movements across the borders of the UK and analytics had been useful in detecting and identifying a range of threats and vulnerabilities which included: border infringement, potential insurgencies and incidences of modern slavery - an all too common occurrence these days.