Sign Out
Logged In:
 
 
 
 
 
Tab Image

DSS on Trial

- validation of decision support systems for air traffic management

by Paula Leal de Matos and David Marsh

European airspace is becoming more congested and air traffic is expected to grow for the foreseeable future. Decision support systems will have an increasingly important role in helping air traffic controllers meet this demand, by contributing to the growth of capacity of the air traffic system while maintaining or improving current levels of safety. The development and evaluation of prototypes are essential steps in the validation of new decision support systems for air traffic control. This paper describes how National Air Traffic Services Limited validates new decision support systems for air traffic controllers.

-oo0oo-

European airspace is becoming more congested and air traffic is expected to grow for the foreseeable future. Decision support systems (DSS) will have an increasingly important role in helping controllers meet this demand, by contributing to the growth of capacity of the air traffic system while maintaining or improving current levels of safety.

National Air Traffic Services Limited (NATS) is responsible for air traffic control in UK airspace, in part of the North Atlantic airspace and at most major UK airports. NATS has been investing in new DSS for Air Traffic Management (ATM) operations in the UK and in Europe in collaboration with other European air traffic services providers. Its Department of ATM Systems Research develops and validates new DSS for ATC.

Here validate means using a prototype to assess how effective the DSS would be if it were implemented for operational use. Some of these systems are intended for operational implementation in a few years' time, others are developed to be in operation in 2005, 2010 or even later. Typically, several stages of validation will be needed between initial design and introduction of a new DSS into operations. Figure 1 shows an idealized validation 'route map', in which each stage is a different type of validation activity. In between each activity, the design is revised and refined.


Figure 1: Validation Route Map

This paper concentrates on prototype development and small-scale real-time simulations. For a discussion of fast-time simulation, see Phillips & Marsh. More details of all aspects of validation in ATC can be found at www.atmdc.nats.co.uk.

Development of the prototype

When developing the prototype, modellers and engineers have to understand the operational context in which the DSS is going to be used and define how the DSS will fit in that context. For instance, to develop the prototype of a Final Approach Spacing Tool (FAST) for Heathrow Approach Control, an iterative process was adopted that involved close observation of different controllers at work and discussion of the observations with them (see Marsh et al, 1999). This resulted in a structured description of the controllers tasks. It also indicated those tasks which controllers do well and thus where a DSS was unlikely to provide significant benefits, as well as tasks where a DSS might provide some benefit.

An operational concept describes the controllers tasks in the context of providing the ATC service and explains how the DSS would fit in with those tasks. Some operational concepts, typically the ones aimed at longer time horizons, involve significant changes from current working procedures; others will result in minimal alterations to current procedures. The definition of the operational concept is also an iterative process, the initial concept is revised as the prototype system becomes tangible and is tested by controllers. Current operational controllers take an active part in defining these revisions.

The design of the user interface is iterative and based on the production of prototypes, using operational input from controllers. Meanwhile, the functions behind the user interface are also developed and tested. Static or animated prototypes have their place in the early stages of design of the interface. However, the effectiveness of a DSS in ATC depends on the controllers being able to work quickly, easily and safely with the system. This cannot be assessed from a simple animation. So real-time trials, in which controllers use a prototype to simulate real air traffic control, are an essential stage in the process of building confidence that a DSS will support safe, orderly and expeditious ATC.

Evaluation of the prototype

The prototype is evaluated by means of real-time controller-in-the-loop simulation trials that typically take up two weeks at a time. A simulation facility is set up to represent the relevant ATC operations centre. It comprises controller radar positions, phone and radio communications, a traffic simulator and a large array of equipment to take measurements during the trial (see figure 2).


Figure 2: NATS Research Facility

The preparation of the simulation can take up to 5 months and involves, among other activities, the definition of the trial design, the simulation layout and the analysis methodology to be used to test the impact of the DSS. The trial team is multi-disciplinary, it includes controllers, software engineers, psychologists, OR analysts, staff performing the role of pseudo-pilots, and input data preparation staff among others. The team is steered by a more senior OR analyst who is responsible for running the validation and putting together the report with the conclusions of the validation. The remainder of this section explains the phases in the evaluation: 1) trial design and testing, 2) measurements and analysis methodology, 3) trial, 4) analysis of trial results.

Trial design and testing

The objectives of the evaluation depend on the DSS under development. However, they usually include gathering controllers' opinions on the tool and measuring:

  • how controllers use the tool and how usable they find it
  • the effect of the tool on controllers' workload, on safety and on the quality of service to aircraft.

To evaluate the effects of the tool two systems are typically simulated: one representing the current ATC operation without the new DSS and a second simulating the ATC operation with the new DSS.

Typically, a trial consists of training sessions, measured exercises and debriefs. It might also include development exercises aimed at further refinement of the DSS. The training sessions enable controllers to become familiar with the system and the measured exercises provide the results that are used for statistical analysis of the impact of the DSS. The traffic samples used in the trials are usually based on observed traffic under a variety of operational scenarios (eg wind direction and speed, visibility, traffic volumes and traffic composition). Marsh et al, 1998 and Pomroy et al, 1997 describe various trial designs in more detail.

Taking controllers off operations to take part in trials is expensive. So, a significant part of the effort in designing and preparing a trial is testing to make sure that the trial will work when the controllers are present: namely testing the planned trial design and testing the DSS to make sure that its fit for the trial.

Measurements and analysis method

The measurements taken during a trial range from qualitative to quantitative data and objective to subjective data. For instance, some components of controller workload can be measured objectively by recording radio-telephony and phone activity or number of instructions given to pilots and measured subjectively by asking controllers for two types of self-assessment: Instantaneous Self Assessment (ISA), and Task Load Index (TLX). ISA measures the controller's perception of his or her workload at regular intervals during an exercise. An ISA prompt flashes every two minutes during the exercise, and the controller responds by pressing one of five buttons, depending on how 'busy' he or she feels. The buttons range from 1, corresponding to under-utilised, through to 5, corresponding to excessive workload. TLX is a questionnaire that was originally developed by NASA. It is distributed immediately at the end of each exercise, and asks controllers to rate their workload in terms of several factors such as mental demand, time pressure and frustration experienced.

To analyse the controller workload in more detail, the Performance and Usability Modelling in ATM toolset (PUMA) can be used. PUMA can be used to explain why workload has varied and predict how workload may respond to further changes. The data collection comprises two stages: the first stage is the video-recording of the controller. The second stage is a debrief of the controller by a human factors specialist, that is aimed at understanding the controller's actions and thought processes. These data are used to develop a very detailed, parallel model of workload. (There is more explanation at the website, www.atmdc.nats.co.uk.)

To assess how usable the controllers find the DSS, a method developed by NATS is used. The method includes: observation of the training and the trial by usability specialists, structured debriefs, video recording, and Software Usability Measurement Inventory questionnaires (SUMI, developed by University College, Cork). SUMI covers aspects such as how helpful, easy to learn and to use controllers find the DSS. It also addresses the question of whether controllers feel 'in control' when they are using the DSS. Other specialist measurements that contribute to the usability assessment include eye-movement tracking and situation awareness measurement.

Controllers' opinions are gathered through debriefs, questionnaires and observation of the exercises. This subjective data complements other measurements and allows the analysis to address areas not otherwise covered. For example, in relatively short trials of early prototypes, safety is often best addressed through structured debriefs looking at what might go wrong, how likely it is, and what its impact would be rather than looking for quantitative measurements as the amount of quantitative data is likely to be insufficient to prove a system meets the safety criteria demanded of ATC.

Precisely which measurements are taken for a particular trial depends on the DSS under evaluation and the objectives of the trial. Other measurements may be defined which are specific to a trial. For example, a prototype of a Final Approach Spacing Tool (FAST) was developed for Heathrow Approach Control (see Figure 3). To examine the impact of FAST on the accuracy and consistency of spacing, the spacing of each aircraft from the following one was recorded as each lead aircraft passed the point 4 nautical miles from the runway threshold (Marsh et al, 1998).


Figure 3: FAST Touch Screen

To analyse the data, the objectives of the trial are broken down into more detailed objectives. For the measurements relevant to each detailed objective, descriptive statistics are produced, usually in the form of graphs or histograms. These provide an indication of each measure's size and range and an informal comparison between the systems with and without the prototype DSS. For each detailed objective 'null' and 'alternative' hypotheses are specified together with the data and statistical method to be used to test them. For example, to examine the effect of a tool on controller workload the null hypothesis is:

The workload without the tool is the same as with the tool;

And the alternative hypothesis is:

The workload without the tool is different from that with the tool

Details of statistical tests used can be found in (Marsh et al, 1998; Pomroy et al, 1997).

Trial

Trials practically never go according to plan. Therefore, contingency plans have always to be prepared. Common contingencies are:

  • The prototype does not do what is expected. This happens often given the experimental nature of the tool, and that controllers may not use it in exactly the way the designers intended.
  • One of the computer systems breaks down because the prototype is a real-time, multi-platform system built only to research quality.
  • A phone or radio fails.
  • Staff taking part in the simulation are missing for part of the trial. This event is very likely because between 15 and 20 people take direct part in the simulation during a period of weeks, and meeting the needs of the operational ATC system must take priority over trials.

In addition, there are many other events that are unpredictable and that arise mainly from having a large group of people working together for a significant period of time using an experimental system that in most situations is new to them. To deal with contingencies, spare exercise slots are timetabled, and backup staff, spare components etc are planned prior to the trial. The actual trial timetable is reviewed as the trial unfolds to ensure the best value is obtained from the time available.

Analysis of trial results

As the data recorded in the trial is extracted and becomes available for analysis, the multi-disciplinary team which has the task of analysing the results discusses the initial results and identifies ways forward in the analysis. The OR analysts, psychologists and software engineers take part in these decisions. Some results require the assistance of controllers who took part in the trial to explain them.As debated in Marsh (1998), one issue that complicates the analysis of trial results is to reconcile objective, subjective, quantitative and qualitative data and strike the right balance between controlled measurement and expert opinion.

Controller opinions are essential to the evaluation, since they are going to be the users of the system and are the ones responsible for the safety of aircraft. As mentioned, trials assess whether controllers find the system easy to learn and to use and whether they find it affects their workload. Controller opinions can also reveal new uses for the DSS and advantages and disadvantages that had not been anticipated. In addition, they lead to further development and refinement of the DSS.

However, opinions are an inaccurate way of measuring quantitative effects. Controllers may say that with the DSS they spend less time speaking on the radio but their estimate by how much is naturally imprecise; they may even be wrong in their assessment. Opinions are also subjective, they often differ from person to person. Quantitative measurements, such as time spent by controllers on the radio, are needed to evaluate in a consistent way the impact of the DSS. However, it is not expected that the quantitative estimates obtained in the trial will be exactly the same as the ones which would be achieved in a real environment, since a simulation cannot replicate exactly a real environment. However, such quantitative measurements can be used to ascertain whether the DSS has a statistically significant impact on workload, capacity etc compared to the simulated baseline. Controller opinions and comments complement the statistical analysis in that they can back up, explain or throw doubt on the quantitative results.

Statistical analysis of the data has to be done with some degree of skepticism. Careful attention is paid to outliers because they can reveal problems in data recording and extraction. For instance, some of the system break downs during the trial affect data recording, but might otherwise go unnoticed until data is extracted and analysed. On the other hand, outliers might correspond to events of particular interest, such as an unexpectedly effective way of using the tool, so they are not excluded lightly from the analysis.

If the conclusions of the trial are favourable to the development of the DSS, the next step, typically, is to further test it in a high-fidelity, large-scale real-time simulation before introducing it into operations.

Lessons in DSS validation

For a validation trial of a DSS to be successful requires the active involvement of the users, namely the air traffic controllers, both in the development of the prototype and its evaluation. The preparation of the prototype’s evaluation demands exhaustive attention to detail: the trial needs to simulate the operational environment closely, at least for the variables that are under evaluation. Although the analysis phase of the project is planned in advance, this has to be done with flexibility that allows the analysts to respond to the results that are emerging.

The analysis of the trial's results requires playing the role of devil's advocate with them. To detect errors in data collection or limitations in the effectiveness of a measurement it is important to examine carefully abnormal results (as well as normal ones). OR analysts need to have a firm common-sense understanding of what their data mean in ATC terms. They need to be open-minded to opinions from all the participants in the validation: controllers, their managers, psychologists etc. There can be important benefits and disadvantages of the DSS that are only revealed in the evaluation and have to be pointed out. The limitations of the evaluation method also have to be recognised, together with the impact they may have on the conclusions of the validation.

Conclusions on the impact of the DSS on safety, workload, quality of service or usability have to be based on a degree of consistency between the different quantitative and qualitative measurements. Use of a broad range of measurements is essential to the development of a robust business case for the introduction of a new DSS.

Validation has a key role in the development of new DSS for air traffic controllers. It enables the study of how a new system of people and machines will behave in its early stages, before reaching a situation when it would be very difficult and costly to go back or to make changes. Validation explores how best to implement the DSS and provides the necessary confidence to go ahead with its development.

Abbreviations

ATC  Air Traffic Control
ATM  Air Traffic Management
DASR  Department of ATM Systems Research, National Air Traffic Services Ltd.
DSS  Decision Support System

EUROCONTROL    

European Organisation for the Safety of Air Navigation
FAST  Final Approach Spacing Tool

HIPS 

Highly Interactive Problem Solver
ISA  Instantaneous Self Assessment
NASA  National Aeronautics and Space Administration
NATS  National Air Traffic Service Limited
PUMA  Performance and Usability Modelling in ATM toolset
TLX  Task Load Index
   

For the interested reader

Marsh D (1998), Real-Time Simulation Trials in Air Traffic Control: Questions in Practical Statistics, Conference Proceedings of Mathematics in Transport Planning & Control, Institute for Mathematics and its Applications, Cardiff University, 1 to 3 April.

Phillips M R and Marsh D T, The validation of fast-time air traffic simulations in practice, J Opl Res Soc, (forthcoming)

Marsh D, Evans A, Russell S and Smith C (1998), Final Approach Spacing Tool: Real-Time Trial, June 1998, Main Report, R&D Report 9868, National Air Traffic Services Limited, London.

Pomroy N, Brown S, Duck R, Marsh D and Roberts A (1997), Application of HIPS to Oceanic Control, R&D Report 9710, National Air Traffic Services Limited, London.

Paula Leal de Matos is assistant professor of economics and management for the Technical University of Lisbon - IST. She previously worked for the real-time trials team in the Department of Air Traffic Management Systems Research of National Air Traffic Services Ltd. She holds degrees in Economics, Operational Research and Statistics and a PhD from the Warwick Business School. Apart from teaching, she is currently working on research projects that focus on decision support for air traffic control and on economics' aspects of air traffic management.

David Marsh manages the real-time trials programme for the Department of Air Traffic Management Systems Research of National Air Traffic Services Ltd. After taking degrees in Philosophy and in Mathematics he worked for the Smith Group for 8 years, devising and assessing ways to apply technology to a variety of operational problems in the military, transport and other businesses, with a particular interest in 'softer' human-centred problems. He has been in his current role for 4 years. While his growing family sleeps, he is working on another degree, in Statistics.

First published to members of the Operational Research Society in OR Insight January - March 2000