# Non-classical complex metrics

## Introduction

### General objectives

This article explores non-classical metrics in general, and complex performance metrics in particular, establishing first the vital context of complex networks in which the latter need to be set. We will refine our definitions of non-classical metrics and their relationship with complexity metrics. These are in contrast to classical metrics, which describe most of the metrics currently in use in air transport performance assessment.

The metrics we use to assess any system are, to a large degree, determined by the way in which we think about that system. More specifically, they are driven by the framework we use to formalise it. A classical formulation of an air transport system, such as the European airspace, will naturally engender classical metrics, such as average delay.

By extending and complementing this framework, through consideration of the system as a complex network, we significantly advance the state of the art by embracing a new set of analytical methods and corresponding metrics. In essence, we move from what may be described as a fundamentally linear (and often univariate) view of the system, towards an attempt to understand and characterise it using non-linear and multivariate methods. For this reason, a substantial part of of this article will address the issues of complex network scales, an area in which much research remains to be carried out, establishing the future landscape for the development of new metrics in ATM, and air transport in general. Whilst many of the examples we cite relate to delay, the future scope of this discussion will be extended to other KPAs.

A key objective is to optimally characterise and improve the understanding of ATM performance. By first better characterising the European air transport system through a better exposition of the topological properties of the network, we can design better metrics in the context of this topology. It is hoped that these two necessarily parallel processes will lead to new insights into ATM performance, which are not currently available to us, driven in large part by the design of improved metrics. The best metrics can only be designed in the best contextual understanding.

The Commission's roadmap (White Paper; European Commission, 2011b) to a Single European Transport Area for 2050, opens with emphasis on the importance of mobility and stressing the broader (international) scope of the network: “Transport is fundamental to our economy and society. Mobility is vital for the internal market and for the quality of life of citizens as they enjoy their freedom to travel. […] Transport is global, so effective action requires strong international cooperation.”

The design of new air transport metrics does not, of course, occur in a vacuum. The corresponding social, political, regulatory and technological contexts have to be taken into account, in addition to the state of the art. We discuss this further here. We will also examine the meaning of mobility in the complex network context.

• Explore the development of new metrics in ATM to improve the understanding of its performance by appropriately extending the scope of the framework;
• Set this exploration in the context of complexity science and network topology;
• Align this research with the evolving political and regulatory framework.

### Definitions

The exploration of non-classical and complex performance metrics needs to be set in the context of complex networks. The corresponding, basic terms were introduced here. We also need to differentiate between the two types of metric to be discussed. We use the term ‘classical’ metrics to denote those that are pre-defined (such as average aircraft delay), are univariate (draw on one variable in the data), and do not draw on complexity science techniques. Some of these types of metric are already commonly in use (such as, indeed, average aircraft delay), whilst others are not (such as average passenger delay) – and, arguably, thus conspicuous by their absence.

‘Non-classical’ metrics are defined to include both (non-complexity) ‘derived’ metrics, which are in contrast to the classical metrics in that they are not (fully) pre-defined but are derived from the data iteratively and are typically multivariate, and those drawn from complexity science. An example of a derived metric is a factor obtained as the result of factor analysis (see Comparing complexity science tools with non-complexity methods in describing key system dynamics). An example of a (simple) complexity metric is the degree of a node. Figure 4.1 shows that these relationships are not wholly mutually exclusive. With regard to the metrics, our discussion will focus on the non-classical metrics set and its subset, complexity metrics. Data mining techniques may be applied not only to generate non-classical metrics but also in topology characterisation, such as identifying complex network communities (groups of densely connected nodes, sharing only few connections with nodes outside their group). These techniques are not needed to calculate classical metrics.

### Scope

As we have mentioned, in the design of new air transport metrics, the corresponding social, political, regulatory and technological contexts have to be taken into account, in addition to the state of the art. Here, we will address the state of the art through our reporting on the literature. The social, political and regulatory contexts will be captured primarily through reference to high-level European policy documents. No review has been undertaken of the literature of passenger surveys in relation to this issue, such that we rely on European Commission policy, and the corresponding consultation processes, including inputs from the national enforcement bodies (which each member state must designate to receive complaints from passengers). The technological context is not discussed, although data provision and integration with SWIM could underpin interesting future work.

In terms of the types of metric, we will focus on complex performance metrics in particular, with some broader reference to (other) non-classical metrics. As for their scale of measurement, a joint survey run by two SESAR Workpackage E projects[1] , has revealed that many ATM stakeholders have indicated that the scales at which KPIs are reported would be most useful at high levels of temporal and spatial granularity, applicable across the eleven ICAO KPAs adopted by SESAR – see, for example, the SESAR Performance Target (SESAR Consortium, 2006) and the SESAR Master Plan (SESAR Consortium, 2008). The levels indicated are notably at rather greater disaggregation than most current industry reporting. A number of stakeholders exemplified the need to tailor the particular KPIs to the needs of the user and the objectives of the measurement. Some may generally be more appropriate at a network level (e.g. environmental KPIs) whereas others are more useful at a disaggregate level, e.g. improving future disruption management using delay-related KPIs. Relevant spatial scales will depend on whether the stakeholder is an airline (e.g. for a route) or ANSP (e.g. for an FIR), for example, and temporal scales may be far more granular for a tactical application than a strategic one. In terms of spatial (geographical) preference, there were not very pronounced patterns across the respondent groups for this, although more than half of the airline groups interviewed preferred to see the highest levels of disaggregation. Care needs to be taken not to assess performance over scales at which such performance cannot be realistically tackled.

The metrics we explore here are not constrained to particular spatial or temporal scales, although the caveats above need to be considered with regard to their operational meaningfulness and a degree of discretion in the complex network context, in that metrics including only a small number of airports in a single, narrow timeframe, are likely to be limited in their usefulness and insights offered.

Kantardzic (2011) explains how much of modern science is based on first-principle models to describe systems, starting with a basic model (such as Maxwell’s equations for electromagnetism, only later empirically proven), which are then verified (or otherwise) by experimental data to estimate some of the parameters. However, in many domains, such first principles are not known, and/or the system is too complex to be formalised mathematically. Data mining (see here and here) moves us from the confirmatory to the exploratory. Driven by the significant advances in computational power and data availability, in the absence of such first-principle models the analyst is now able to derive models by estimating useful relationships (input-output dependencies) between a system’s variables: “Thus there is currently a paradigm shift from classical modelling and analyses based on first principles to developing models and the corresponding analyses directly from data.” We will cover the issue of networks and their classes in some detail, to establish this relatively immature (in ATM) but vital research context.

## Research lines

### Exploring how properties of the system change as a function of different temporal and spatial scales and of network representations

#### Problem statement

We require our characterisation of the air transport network to be informative in terms of offering new insights into performance. There is little point generating a topological framework, and set of metrics, which are not meaningfully related to any KPA. A useful framework, and corresponding metric set, should offer insights into a range of performance phenomena in ATM. Investigating scaling tells us how the properties of the system change, or are indeed invariant, over different temporal and spatial scales. Analysis thereof affords, it is hoped, insights into system dynamics.

One example would be to better understand the emergence of delay phase transitions from normal conditions in a particular community of airports – in other words, for no ‘apparent’ reason, delays here suddenly become severe (the critical thresholds through which complex systems pass are sometimes called ‘tipping points’). It is to be hoped that the methods of characterising and measuring complex networks will help us to understand such phase transitions, and other ATM performance phenomena that represent degradation of service. The aspiration here is to gain insights into, and capture through new metrics, idiopathic (typically non-linear) degradations, rather than classical cause-and-effect relationships (typically linear).

As we will see next in the literature review, the current examination of air transport networks is relatively immature, notwithstanding helpful early insights from some researchers. Key components which are currently missing from (most) complex network methodologies in this respect are:

• Including the temporal components of the network;
• Modelling/measuring delay propagation by tracking tail numbers and other flight-to-flight dependencies;
• Inclusion of the actual passenger connectivity perspective;
• Incorporation of delay costs.

We will develop these themes later. Further work is also needed to better understand how different network representations (e.g. with focus on passenger connectivity and mobility, as opposed to simple aircraft movements) may relate to different network properties and class characterisation, including their principles of preferential attachment, and the corresponding relationships with network performance. Another key question is how ATM networks demonstrate different vulnerabilities (particularly through hub failure) compared with other complex networks.

In characterising network scales, we identified in Section 4.1.2 that we may create specific network representations and therefore concomitant definitions of nodes, based on the context of what we are analysing. However, since airports are the foci of most air transport operations, it is likely that the nodes will in most, if not all, cases be located at airports, although this is not necessarily the case (airline delay costs could be modelled at nodes which are not geographically coincident with airports; en route delays could be modelled with regard to weather or capacity constraints). Non-complexity, non-classical metrics may also furnish comparative insights in this area (see the case study Comparing complexity science tools with non-complexity methods in describing key system dynamics).

#### Literature review

Paleari et al. (2010) cite several previous studies which have classified airport systems as small world, scale-free networks (see here), in which new links are relatively more likely to be appended to nodes with higher connectivity, which yields a power-law distribution of airport degrees. Indeed, the heterogeneous nature of many real, scale-free models is driven by the common factors of growth and preferential attachment (Albert et al., 2000). As Barabási et al. (1999), explain: “[…] the probability with which a new vertex connects to the existing vertices is not uniform, but there is a higher probability to be linked to a vertex that already has a large number of connections.” As a further example of preferential attachment, Amaral et al. (2000) discuss two limiting factors:

• ‘aging’ of the nodes;

‘Aging’ limits preferential attachment, preventing a scale-free distribution of connectivities. Rarely are airports ‘retired’ from the network; an example of an exception is the closure of Oslo Fornebu in 1998, with operations moving to the previously secondary airport, Oslo Gardermoen. Nevertheless, airport roles may (subtly) change as a function of the development of other airports, particularly those in the same catchment area.

It is of note that similar effects may be observed in the evolution of metabolic pathways, protein mapping and characterisation, and protein–protein interaction networks[2] , in which field complexity science found much of its early application. Here, the network growths are driven by evolutionary factors and occur over much longer temporal scales, as compared with the largely anthropogenic factors and relatively much shorter (and much more recent) timescales applicable to air transport network growth.

As an example of the cost effect (in the literal sense), the authors comment on the limited capacity of airports, which cannot serve as hubs to a large number of airlines, and with airlines preferring to have a small number of such hubs. In a separate issue from the hub effect per se, we should also consider that airports may reach saturation points in pure traffic terms, for example due to infrastructure constraints (usually runway throughput is the rate-limiting step) and/or environmental restrictions. As Paleari et al. (ibid.) comment, political barriers (and regulations) may also determine the actual growth of an air transport network.

In summary, we may describe three general factors pertinent to the development of airports as nodes in air transport networks:

• market effects (maturity and demand);
• capacity constraints;
• regulatory/policy factors.

Using Airport Council International (ACI) data for 1999, the network of “world’s largest airports” is studied by Amaral et al. (2000) (sample size not given), with nodes as airports and the links being direct connections. The authors state that the network of world airports is a small-world network in which one can connect between any two airports in 1–5 links. The number of passengers in transit was used instead of data on the number of distinct connections. Assumptions were made that:

• There is a typical number of passengers per flight[3];
• There is a typical number of flights per day between two cities[4];
• The number of distinct connections from major airports is proportional to the number of passengers in transit through that airport.

From a linear-log plot of the cumulative distribution, the authors concluded that these data did not display a power law regime and had an exponentially decaying tail, implying single-scale connectivity. Analyses of cargo data at these airports produced similar results.

In early work on a SESAR Workpackage E project (ELSA) aiming to analyse the present ATM system and to use such results in an agent-based model of the SESAR business trajectory scenario, Lillo et al. (2011) cite a range of papers using network theory to model air transport systems. Despite this research being in its earliest stages at the time of reporting, some useful results are reported. For one day of European data: the cumulative probability of the strength ($s$, number of flights) per airport has a fat, non-exponential tail; the cumulative probability of the degree ($k$, number of airports connected with a direct flight with the airport) has an exponential tail; and the cumulative distribution of link weight ($w$, number of flights per route) has a tail well approximated by an exponential, although all three distributions were highly heterogeneous. In a double-logarithmic plot of strength and degree, however, a power law is observed in the region of $k>20$. An exponent of 1.39 indicated a superlinear relationship, such that if an airport doubles the number of destinations, it would (typically) increase the number of flights by a factor of approximately 2.45 ($2^{1.39}$).

Paleari et al. (2010)[5], using complex network theory and associated metrics, investigate the connectivity of the airport networks in Europe, the US and China. The objective was to determine which network is most beneficial to passengers in terms of travel time and which network features produce such an outcome. The US network has the most nodes; the European network has the most direct connections. In Europe the major airports are all relatively close together, whereas in the US and China they are relatively dispersed.

The models all used one day of scheduled data in 2007, published by Innovata, and examine possible connections, with no maximum connecting time assumed (although the itineraries need to be able to be completed within one calendar day). The ratio between in-flight distance and potential direct flight distance was also not permitted to exceed 1.25.

The topological analysis confirmed that all three airport systems belong to the class of small-world networks with similar degree distributions and clustering coefficients (which refers to the probability that two airports connected with a third one are also directly connected to each other).

Configurations were found to be similar to the scale-free power law, although with better fits obtained with a double Pareto law[6] (see Figure 4.2, where the ‘knee’ of the double Pareto is clearly visible). The European network showed the highest values of the power law exponents, meaning that its degree distribution decays more rapidly.

#### Research challenges

Better Alignment of Models and Metrics with Actual System Behaviour.

We have mentioned earlier that investigating scaling tells us how the properties of the system change, or are indeed invariant, over different temporal and spatial scales. With specific regard to network characterisation, and as raised in Section 4.2.1.1, significant challenges facing the application of complexity science in the context of air transport in general, and ATM in particular, are including the temporal dimensions of the network (largely a modelling challenge), modelling delay propagation by tracking tail numbers and other flight-to-flight dependencies[7] (largely a data availability issue), the inclusion of the actual passenger connectivity perspective (again, largely a data issue) and the incorporation of delay costs. These issues also impact the development of corresponding metrics, since the two themes are integrally linked, as we have discussed. Indeed, the delay cost is arguably more central to the theme of metric development, so we will develop this discussion a little further in Section 4.2.2.3. Paleari et al. (2010) explain the challenge of the temporal dimensions:

The main drawback of complex network analysis is its failure to take into account the temporal coordination of airport networks. In many cases, the additional utility derived from connecting to a high-degree airport is the chance to use this airport as an intermediate step to other destinations. But an interconnection is really feasible only if incoming and outgoing flights at the intermediate airport occur within a reasonably narrow window.

Whilst in this time-dependent, minimum path approach, only possible connections with no maximum connecting time were assumed, it is important that future work considers actual passenger connectivity to correctly represent the network flows, and consequent characteristics (and, complementarily, when considering metrics, the corresponding performance).

A model or network representation, which addresses these hitherto unmet challenges, i.e. in a temporally robust framework which includes both flight-to-flight dependencies and an accurate reflection of passenger flows, will allow us to better characterise (and/or model) the essential properties of the network in a fundamentally more accurate way, rather than based on assumptions about such behaviours, which researchers have often been forced to make due to data inadequacies and/or lack of computational power. The former constraint remains a barrier in many cases, whilst the latter is becoming less problematic with ever improving hardware and software. Until such time as these significant advances are made, we are left with an (at best) partial understanding of the behaviour of the European air transport and ATM network, which is likely to inhibit our ability to mature the solutions we can develop. In his paper, Strogatz (2001) concludes:

In the short run there are plenty of good problems about the nonlinear dynamics of systems coupled according to small-world, scale-free or generalized random connectivity. The speculations that these architectures are dynamically advantageous (for example, more synchronizable or error-tolerant) need to be sharpened, then confirmed or refuted mathematically for specific examples. Other ripe topics include the design of self-healing networks, and the relationships among optimization principles, network growth rules and network topology […] In the longer run, network thinking will become essential to all branches of science as we struggle to interpret the data pouring in from neurobiology, genomics, ecology, finance and the World-Wide Web. Will theory be able to keep up?

Many of these challenges remain just as pertinent to the application of complexity science to ATM today.

### The need to embrace complexity science in order to effectively extend our set of useful metrics in ATM

#### Problem statement

The current set of metrics in use in air transport is not fully adequate to the task of measuring new operational objectives, most notably those driven from high-level European political agendas relating to improved service delivery to the passenger. For example, how are we to measure the effectiveness of new passenger-driven performance initiatives in air transport in general, and ATM in particular, if we do not have the corresponding set of passenger-oriented metrics? (The average delay of a delayed flight and the average delay of a delayed passenger are not the same, the latter normally being the higher.) The same question holds for efforts to understand and reduce delay propagation, for which we currently have few metrics, although the problem is so frequently cited in discussions of ATM.

Furthermore, insufficient complexity science research has, as yet, been applied to the domain of air transport in general, and ATM in particular. The types of measures, which belong to the tool set of complexity science, focussing on the quantification of network performance and connectivity, lend themselves naturally to application within the ATM context.

As with the objective described in Section 4.2.1 (investigating the network scales and topologies of European ATM), the overall aim of the research theme in this section, developing new metrics to better assess and improve ATM, is also to optimally describe ATM through better insights into its dynamics. However, whereas in Section 4.2.1, the focus is on characterisation, here, the complementary objective is on ways to better quantify and understand performance. Both of these broader objective threads are mutually necessary components of the overall ambition to improve actual ATM operations. Improved characterisation and performance assessment are of little value if they cannot be applied to such practical amelioration.

We also discuss the concept of airports as delay ‘multipliers[8]’ in the context of metrics used in complexity science, such as centrality metrics, and the associated role of community analysis. Problems facing researchers in this area include making appropriate contextual selections of nodal representations (such as flight delay minutes, passenger delay costs to the airline, passenger value of time) and establishing that observed relationships are actually causal.

#### Literature review

In this section we will briefly touch upon several themes, focusing on how certain ATM characteristics, such as delay propagation, may be captured through metrics from the tool set of complexity science. These themes will be linked back to the broader, complementary context of scale characterisation (discussed in Section 4.2.1), and we will briefly examine the important political and regulatory context of metrics (raised in Section 4.1.1.).

Understanding the propagation of delay through the air transport network is perhaps one of the most important contributions that complexity science may bring to ATM. The concept of airports as ‘delay multipliers’ is discussed (under various terminologies) in the literature[9], although still relatively rarely in the context of complexity science. Airline hub airports are often suggested as foci for the strongest tendency to propagate delay. When congestion occurs, delays incurred earlier in the day may be expected to propagate back more often at airports/hubs with higher proportions of short-haul flights and this may also spread tactical demand more evenly than scheduled (Pyrgiotis, 2011). The back-propagation effect at hubs is also cited by Jetzki (2009).

Indeed, in those network topologies where airports are nodes, it is to be noted that the behaviour of these nodes is to a large part determined by the type of carriers operating there, and their business model. Airports that tend to pass on a higher proportion of their delay are mostly served by carriers that have higher proportions of delay propagation, although this does not mean that these airports experience large absolute amounts of delay (Baden et al., 2006). The intensity of a carrier’s operation will also determine how it is able to deal with disruption, through, for example, aircraft and/or crew swaps, and how it can recover from a cancellation. Table 4.1 summarises some of the key characteristics [10] of centrality metrics which we propose may be applied to good effect in furthering our understanding of ATM.

Summary descriptions of centrality metrics.
Centrality metric Summary description
degree of centrality the number of connections a node has, or, in other words, the number of neighbours; the greater the degree of centrality, the more important that node is, functionally, within the network
strength of centrality similar to the degree of centrality but used when a numerical value (weight) is associated with each link (i.e. when we are dealing with a weighted network)
betweenness of centrality the number of shortest paths (taking into account all pairs of nodes) which pass through a node; nodes with high betweenness are usually those nodes that connect different communities, e.g. in the ATM context allowing perturbations to spread between different parts of the system

When nodes are defined to represent some parameterisation of delay, for example, if we have a few nodes with a very high degree, this would suggest that those nodes were responsible for the propagation of delay in the network. On the other hand, if all nodes have more or less the same degree, no delay multiplier node is suggested.

Similarly for the distribution of the strengths: if some nodes have extremely high values, they are concentrating the delay correlations in the network, and therefore are likely to be delay multipliers. Those with low values are more likely to be delay sinks.

The betweenness is a metric measuring how important a node is for the movement of information in a network. It can be applied both to nodes and links (examples of the latter include en-route and arrival queuing delay). It is defined as the proportion of shortest paths, i.e. the shortest sequence of links connecting two nodes between any pair of nodes in the network, that pass through a given node (or link) in the network. Betweenness analyses how information (here, specifically delay attributes) moves through the whole network (White and Borgatti, 1994), whereas the degree of centrality is a local metric, taking into account only the structure of the network around the node.

Figure 4.3 is a sub-graph of a hypothecated, larger, ATM network. There are two main communities of nodes: the red and the green ones. These may correspond to two logical communities (or, indeed, they may not appear to be logical, or expected – see Comparing complexity science tools with non-complexity methods in describing key system dynamics). The blue node in the middle has a very low degree (just two connections) although it is very important in the network: if some delays are generated in the red community, they may propagate through the blue node, and on into the green community. The betweenness of the blue node will be (relatively) very high. Whatever it is that we are measuring in the above representation of nodes (flight delay propagation per se, or passenger cost propagation per se), the ‘flow’ from the red community into the green community is ‘channelled’ through the blue node. As we mentioned above, corresponding metrics can be calculated using the betweenness of links: in this case, the two links into and out of the blue node will also have very high betweenness. If we succeeded in improving the dynamics of the blue node through some operational change, we may be able to improve system performance by breaking (or diminishing) the propagation chain between the red and green communities. Note that it is important that we establish that these effects are causally related, such as through Granger causality analysis (see Section 3). Since some airports, especially major hubs, impact the passenger trip delays significantly more than others, recognition of this asymmetric performance can help reduce the total passenger trip delay propagation.

Community analysis aims to detect communities, i.e. groups of highly connected nodes, by only using the information encoded in the topology of the network. Whilst many studies have focused on the simple partition of the network, more sophisticated approaches have been proposed: for example, the identification of hierarchical (or multi-scale) structures inside communities, or the identification of overlapping communities, i.e. situations where a node may be part of two communities simultaneously.

Underlining the need to explore the metrics in the context of the corresponding scales, we observe that when analysing network-based representations of real systems, the resulting topologies show structures across different scales. From the microscale point of view, it may be relevant to check how some nodes have a higher connectivity, or how they are more ‘central’ for some process occurring in the network. From the macroscale point of view, we can describe the global structure of the network: for example, is it densely connected, or are there relatively few links? Relatively recently, it has been shown (Palla et al., 2005; Girvan and Newman, 2002) that most of the relevant information is codified between these scales, in what is called the ‘mesoscale’ of the system (see also Section 3 regarding these scale types). Some of the main concepts included at this scale are communities and ‘motifs’ (specific patterns of functional connections between a small group of nodes).

Before concluding our brief description of metrics, we return to two papers cited earlier. Lillo et al. (2011), in the early stages of their work, expected the way the fraction of delayed flights depends on the network average path lengths to be of particular interest in their evolving research. Paleari et al. (2010), as referred to earlier, used a time-dependent, minimum path approach in the three air transport networks examined, to calculate the minimum travel time between each pair of airports. An average ‘‘as the crow flies” speed of connections is the main indicator employed in each network to assess its overall level of service to passengers. In this respect, the US network performs better than the European network, due to longer origin-destination distances and a higher level of coordination in intermediate airports, thanks to a longer period of liberalisation history.

On the other hand, in the European network, this level of service deteriorates least as one passes from major to small airports. One reason cited (ibid.) for the level of service deteriorating least as one passes from major to small airports, is that European governments and authorities have enhanced the connectivity of remote regions: Europe has the highest percentage of airports accessible within one day, but the lack of coordination at the European level produces longer waiting times. This reminds us of the need to consider metric development in the wider policy context, in terms of understanding results and for the better design of new metrics.

Social and political priorities in Europe are now shifting in further favour of the passenger, as evidenced by high-level position documents (European Commission, 2011a; European Commission, 2011b). Metric design needs to follow the progress of planned regulatory review in this area, particularly with regard to the underpinning regulatory instrument – Regulation 261, the European Union’s air passenger compensation and assistance scheme, European Commission (2004). An example of the need for metrics to take account of changing regulation is the potential extension (European Commission, 2011c) of the legislation to cover passengers’ missed connections, which is neither covered by current law nor current metrics. A roadmap for the possible legislative revision to Regulation 261 was published in December 2011 (European Commission, 2011c). In a parallel initiative, SESAR is chairing one of the main ACARE working groups to define a new European air transport research and innovation perspective: "[…] placing aviation in a global, intermodal context that will put the air transport passenger/customer at the core of the system." This work will form part of the Commission’s ‘Strategic Research and Innovation Agenda’, due to be published in July 2012, which incorporates elements of its policy on ‘Horizon 2020’ and the Commission’s White Paper (European Commission, 2011b).

The imperative of embracing the passenger perspective in the development of new ATM metrics is not only driven by emerging policy, but underlined by the fact that the average delay of a delayed flight and the average delay of a delayed passenger are not the same, the latter normally being the higher, as mentioned. Using large data sets for passenger bookings and flight operations from a major US airline, Bratu and Barnhart (2004) show how passenger-centric metrics are superior to flight-based metrics for assessing passenger delays, primarily because the latter do not take account of replanned itineraries of passengers disrupted due to flight-leg cancellations and missed connections. These authors conclude that flight-leg delays severely underestimate passenger

#### Research challenges

Designing More Operationally Useful Metrics.

In Section 4.2.2.1, we referred to some of the key challenges of developing new metrics to better assess and improve ATM. Generically, these include:

1. devising metrics that measure various aspects of delay propagation, which is currently very poorly represented in the set of metrics in use, despite its clear importance;
2. devising metrics that capture positive qualities of the transportation network, such as mobility; Paleari et al. (2010) summarise an example concisely as: “Mobility measures are based on the minimum path between any given pair of nodes. In the simplest case, they may represent the number of steps needed to travel from one node to another. In more complex cases, each step is weighted by one or more proxies for the importance of nodes and edges. In the field of airport networks, connections may be weighted by frequency of operation, number of seats offered, or geographical distance”;
3. better aligning the metrics with policy, to reflect the ultimate end-user of ATM, the passenger, and to recognise that flight-centric metrics are not a suitable surrogate in their own right, although they do clearly bring value to the set of metrics;
4. establishing causality between observed relationships (see Section 3);
5. effectively capturing measures of dispersion (such as standard deviation and excess kurtosis) in the metrics and thus including (un)predictability; this means not focusing too heavily on high-level measures of central tendency, such as averages;
6. applying the tools of complexity science to the development of metrics, since this discipline lends itself naturally to the measurement of network performance, covering the scope of (a) through (e);

Specific research barriers include:

1. Precisely defining ‘delay multiplier/sink’ airports, and identifying these using appropriately comprehensive data at the correct scale (spatial, temporal and categorical – e.g. flight-centric, passenger-centric, other, or multiple);
2. Correctly modelling delay costs over networks, taking into account the non-linearity with respect to delay duration (and thus that delay minimisation is not the same as delay cost minimisation), recognising the specific scales of such calculations with respect to tactical and strategic costs of delay (and the relationship between them, for example through (e), above), and differentiating between delay costs to the airline (for example) and the passenger value of time (see Cook and Tanner (2011) for a discussion of these issues);
3. The high price of commercially available, full passenger itinerary data.

Through tackling challenges and overcoming the barriers, we may hope to initiate a paradigm shift in the scope and usefulness of the set of metrics currently employed in air transport and ATM, and a concomitant enhancement in the way we quantify and understand performance, and thus how it may be improved.

## Case studies

### Better understanding ATM system dynamics through a step-change improvement in mapping the network topologies

This case study is underpinned by a step-change improvement in the way we characterise the ATM network topology through the use of new data, thus meeting many of the research challenges identified in sections 4.2.1.3 and 4.2.2.3.

Figure 4.4 is the cumulative probability plot of total passengers, connecting passengers and aircraft movements for European airports in 2010. Due to non-reporting of transfer passengers to ACI, some key airports are omitted, however. We use these values as a working proxy for degree, k. Linear fits (on the double-logarithmic scale) are for the 80% cumulative values of k (a large proportion of the movements/passengers are accounted for by a small number of large hubs). For total passengers and aircraft movements, the values of (see Section 5.2.1.2) correspond to values of of 2.01 and 2.14, respectively ( = ). This compares with values of for scale-free networks cited by Barabási et al. (1999) in the range 2 – 4 (for the internet, metabolic reaction networks, and a telephone call graph) and by Strogatz (2001) of 2.1 – 2.4. The connecting passenger is short of this range. This is only a rather crude attempt to characterise the network, relying on the number of movements and (connecting) passengers to represent the degree (an approximation also made by Amaral et al. (2000), as discussed in Section 4.2.1.2).

It is logical that the functionality of airports is based not only on characteristics such as their size and number of passengers and/or movements, but also on their geographical location and the specific factors which have contributed to their development. Málaga and Palma de Mallorca are good examples of airports handling very large numbers of passengers (over 10 million in 2010) but for which neither the high volumes nor low connecting passenger numbers could be predicted by inspection of the European network from a simple (unweighted) topological point of view showing destinations served. These airports, serving popular holiday destinations and with large volumes of passengers carried by the very strong low-cost carrier presence, thus belong to a particular class of airport with a corresponding modus operandi of preferential attachment. Paleari et al. (2010) comment how despite the emergence of a point-to-point structure and greater average degree due to the rise of low-cost carriers, the European network still has a lower clustering coefficient than the US or China. Similar classes could be used to describe: (i) airports serving as connection hubs, often in the context of an airline alliance network; (ii) airports not acting as connecting hubs but serving as high intensity centres for largely point-to-point traffic (e.g. for low-cost carriers); (iii) airports with what may be described as ‘mixed’ characteristics.

The air transport network is not in developmental stasis. Not only do technological and operational changes through SESAR shape the evolution of ATM (thus directly influencing preferential attachment), but developing markets, changes in policy and regulations, and capacity saturation issues also (subtly) remould the network topology.

Hubs are obviously particularly susceptible to act as delay multipliers in the ATM network, often operating at, or near to, capacity and with multiple dependencies between flights, which may be in contrast to other complex networks where hubs are more readily protected or otherwise resilient. Keller (2005) aptly describes this as the ‘Achilles’ heel’ of certain networks. As yet hardly researched, there may well be value in understanding better the role of other high-volume nodes (such as Málaga and Palma de Mallorca) in terms of their contribution to network delay, even though they are probably less likely to act as delay multipliers per se due primarily to the lack of multiple dependencies, which prevail at hubs.

There is a good opportunity to examine the European network topology with actual passenger flows and full connectivities through the use of commercially available data. This would give valuable insights into the network topology by both resolving the shortcomings of the ACI data (due to missing reporting) and, indeed, vastly extending the richness of the data at the disaggregate level.

Such a case study to further the state of the art could include the building of a time-line network/graph where each node is an airport arrival or departure, each link (or edge) is a flight. A complication here is that several nodes represent the same airport, but this is a valuable network representation in terms of computing temporal metrics.

We could expect to extract relevant information from the joint analysis of the different scales. If a specific network of causality were to be constructed for aircraft delay, with nodes representing airports, then a pair of nodes is (causally) connected if we detected some causality between the evolution of delays at both airports. Initially, the focus could be on the global structure of such a network: is it densely connected, are there strong relationships between the delays of different airports, or does it seem that they evolve randomly? Secondly, the microscale could be examined, trying to identify if some nodes are more responsible for the dynamics of the system. A further key question is whether we can learn anything from the mode of preferential attachment, in the fullest sense, to help develop the network in a more resilient manner (at least to some tangible extent).

### Quantifying the trade-offs between metrics: both classical and non-classical

In Section 4.2.2.3 we identified numerous research challenges to be met regarding the development of new metrics in ATM. Many of these have to be addressed in order to advance the state of the art. In this case study, we suggest a further, specific area of research, focusing on the trade-offs between different types of classical and non-classical metrics. This will indeed be a significant challenge, since even the trade-offs between classical metrics (such as flight delay and flight planning flexibility) are poorly understood, before we include the range of newly forming complexity metrics to complement this task. As discussed, and in parallel, we will also need to extend the range of both types of metric to embrace delay propagation and passenger-centricity, also through metrics that capture positive qualities of the transportation network such as mobility and passenger utility (value-of-time based). Several metrics related to trip duration and mobility could be calculated, like a minimal spanning tree [11] from a given airport and geodesics (e.g. shortest paths).

In a flight-centred graph model, nodes would be flights and passengers would be edges (i.e. two nodes are connected if, and only if, they share a passenger). Passenger trips may be considered as paths (a sequence of edges connected by nodes). In this representation of the nodes, the temporal information would be lost, but we would obtain a clearer picture of passenger trip chains (and the dependence between flights and passengers).

Collinearities will be evident both within metric groups and between them. For example, four broad groups of delay metric can be classified, in terms of their centricity (or orientation):

• flight;
• propagation;
• passenger (cost to airline);
• passenger (value of time).

Between these groups, the aircraft (which propagate delay and on which passengers travel) are the common denominators driving the correlations. Although many such correlations will be positive, some may, at first sight at least, appear somewhat counterintuitive, in that they express negative correlations. Examples include:

• holding several flights for an in-bound delayed flight could improve net passenger delay cost (a new metric) but worsen aircraft delay minutes (an existing metric);
• similarly, comparing different rationing rules in a model ground delay program rationing rule simulator, it was found (Manley and Sherry, 2008) that passenger delays could be significantly decreased with a slight increase in total flight delay; rationing by passengers on-board decreased total passenger delay by 22%, with only a 1.1% increase in total flight delay;
• cancelling flights may help to mitigate the propagation of delay through the network; although cancellations reduce flight delay (relative to operating a (very) late flight) they markedly increase passenger delay;
• Pyrgiotis et al. (2010) cite how delay propagation tends to smooth daily airport demand profiles, pushing more demand into the late evening, resulting in local delays which are smaller than they would have been with the original demand profiles, especially at hub airports – not only did the expected total delay increase later in the day but its variance increased greatly as well.

These types of trade-offs are evident between many of the ICAO KPAs adopted by SESAR, mentioned in Section 4.1.3, and applying complexity science to help understand the complex non-linearities should improve decision-making and prioritisation regarding KPA priorities: it is evident that we cannot simply improve all of them simultaneously.

### Comparing complexity science tools with non-complexity methods in describing key system dynamics

In this case study, we propose that complexity science tools, many of which have been discussed in preceding sections of this paper, be compared in terms of the power of the results obtained in better characterising ATM performance, with other methods and corresponding metric designs, both classical and non-classical.

The node selection process mentioned in Section 4.2.1.1 could include a basic data mining approach such as a factor analysis of delay data, thus affording derived, non-classical metrics. Factor analysis attempts to express a set of observed, independent variables, as a new set of independent variables, or ‘factors’. It reveals intercorrelations between the original variables, one of which, in this case, would be the delay magnitude. As a result, and usually after several iterations of the algorithm, a solution is found, which best describes the emergent delay. In the case of factor analysis, the task of the analyst is then to ascribe meaning to the new factor (e.g. ‘a sequence of turnaround delays impacting high load-factor flights, which then failed to make their ATFM slots’). In practice, many of the factors that emerge from such analyses are difficult to interpret, but it is the act of interpretation itself, and the remodelling applied by the researcher, which, it is hoped, deepens the understanding of the underlying mechanisms involved.

Due to the nature of many derived metrics, we are more likely to make unexpected findings, and to deepen our understanding through being prompted to explain counterintuitive results, than we would through the use of classical metrics alone and without the context of complexity science. Some important analogies emerge between the choice of factors in a factor analytic model and the choice of nodes in a graph theoretic model, for example. Keller (2005) addresses some misuses of the tenets of complexity science, and cautions against reading too much into goodness of fits, not least with deductions made from double-logarithmic plots and concerning preferential attachment: “Mathematical modelling generally begins with the attempt to formulate a model that reproduces a set of data, but the risks of inferring inversely — from data back to model — are notorious”. Other researchers (for example, Nachera et al. (2009), in a model for protein–protein interaction networks supported by experimental data) have shown that connectivity can follow scale-free distributions even in the absence of preferential attachment.

It may be instructive to take specific examples of outputs and results from the application of complexity science in ATM and to see to what extent they may be reproduced by other methods of characterisation, by non-complexity metrics and tools. This echoes somewhat previous proposals put forward within the context of this paper: “A complementary approach can be used to investigate the temporal propagation of disturbances. In fact, having selected the appropriate variables/proxies, standard time series analysis tools can be used in order to reveal whether or not disturbances have a clustered structure in time. It is worth mentioning that such an approach can be applied both to the variables/proxies describing disturbances and for the time series obtained by characterising the networks constructed at different time ranges” (ComplexWorld SESAR WP-E Research Network, 2011). Of particular interest would be to explore how well non-complexity metrics/methods capture certain features of system dynamics (such as uncertainty and propagation), compared with those of complexity science. Such a case study may help to compellingly stress the specific benefits of complexity techniques, by throwing the outcomes into focus with non-complexity methods, thus allowing researchers in ATM to propose more specific benefits for other disciplines and to foster improved outreach beyond the field. Such a meta-methodological approach also mitigates what is sometimes referred to as ‘research enculturalisation’, whereby a field of research adheres too narrowly to its own received wisdoms and culture.

## Recent Developments

This section is devoted to describing recent research results that are relevant to the research theme of non-classical complex metrics. If you have any related results that you wish to contribute, please feel free to add your contribution as a subsection below. Also consider linking your results with relevant portions of the main text of the article and/or other articles (e.g. related research lines) to increase its visibility and help give it a context inside the research theme.

## References

• Amaral L. A. N., Scala A., Barthélémy M. and Stanley H. E., 2000. Classes of small-world networks, Proceedings of the National Academy of Sciences of the United States of America, 97 (21), 11149–11152.
• Albert R., Jeong H. and Barabási A-L., 2000. Error and attack tolerance of complex networks, Nature (Letters), 406, 378–382.
• Baden W., DeArmon, J., Kee J., Smith L., 2006. Assessing schedule delay propagation in the national airspace system, 47th Annual Transportation Research Forum, New York University, New York, 1–17.
• Barabási A. L. and Albert R., 1999. Emergence of scaling in random networks. Science, 286 (5439), 509–512.
• Barabási A.-L. , Albert R. and Jeong H., 1999. Mean-field theory for scale-free random networks, Physica A, 272, 173–187.
• Bratu S. and Barnhart C., 2004. An analysis of passenger delays using flight operations and passenger booking data, Sloan Industry Studies Working Paper WP-2004-20, 1–24.
• ComplexWorld SESAR WP-E Research Network, 2011. Spatio-temporal propagation of disturbances in ATM systems, D3.5 Complex ATM White Paper, Issue 1, Annex III(7), 48–50.
• Cook A. and Tanner G., 2011. European airline delay cost reference values. Commissioned by EUROCONTROL Performance Review Unit, Brussels, 1–86.
• European Commission, 2004. Regulation (EC) No 261/2004 of the European Parliament and of the Council of 11 February 2004 establishing common rules on compensation and assistance to passengers in the event of denied boarding and of cancellation or long delay of flights, and repealing Regulation (EEC) No 295/91, Official Journal L 046, 17 February 2004, 1-8.
• European Commission, 2011a. Flightpath 2050 - Europe’s Vision for Aviation (Report of the High Level Group on Aviation Research), ISBN 978-92-79-19724-6, DOI 10.2777/50266, 1–23.
• European Commission, 2011b. White Paper: Roadmap to a Single European Transport Area – Towards a competitive and resource efficient transport system, Brussels, 1–30.
• European Commission, 2011c. Possible revision of Regulation (EC) 261/2004 on denied boarding, long delays and cancellations of flights, Roadmap Version 1, November 2011, 1–5.
• Girvan M., and Newman M. E. J., 2002. Community structure in social and biological networks, Proceedings of the National Academy of Sciences of the United States of America, 99(12), 7821-7826.
• Jetzki M., 2009. The propagation of air transport delays in Europe, Doctoral thesis, Department of Airport and Air Transportation Research, RWTH Aachen University, Germany, 1–107.
• Kantardzic M., 2011. Data mining – concepts, models, methods, and algorithms (2nd Ed.), John Wiley & Sons, 1–25.
• Keller E.F., 2005. Revisiting ‘‘scale-free’’ networks, BioEssays (27), 1060–1068.
• Liljeros F., Edling C.R., Amaral L. A. N., Stanley H. E. and Åberg Y., 2001. The web of human sexual contacts, Nature (Brief Communications) 411, 907–908.
• Lillo F., Miccichè S., Mantegna R. N., Beato V. and Pozzi S., 2011. ELSA project: toward a complex network approach to ATM delays analysis, First SESAR Innovation Days (l’Ecole Nationale de l’Aviation Civile (ENAC), Toulouse), 1–7.
• Manley B. and Sherry L., 2008. The impact of ground delay program (GDP) rationing rules on passenger and airline equity, Third international conference on research in air transportation, Fairfax VA, 1–9.
• Nachera J.C., Hayashidab M. and Akutsub T., 2009. Emergence of scale-free distribution in protein–protein interaction networks based on random selection of interacting domain pairs, BioSystems, 95 (2), 155–159.
• Paleari S., Redondi R. and Malighetti P., 2010. A comparative study of airport connectivity in China, Europe and US: Which network provides the best service to passengers?, Transportation Research Part E, 46 (2), 198–210.
• Palla G., Derényi I., Farkas I. and Vicsek T., 2005. Uncovering the overlapping community structure of complex networks in nature and society, Nature, 435, 814-818.
• Pyrgiotis N., 2011. A public policy model of delays in a large network of major airports, Transportation Research Board 90th Annual Meeting, Washington DC, 1–26.
• Pyrgiotis N., Malone K. and Odoni A., 2010. Modelling delay propagation within an airport network, 12th World conference on transport research, Lisbon, Portugal, 1–26.
• SESAR Consortium, 2006. SESAR Definition Phase: Milestone Deliverable 2, Air Transport Framework - The Performance Target, December 2006, 1–100.
• SESAR Consortium, 2008. SESAR Definition Phase: Milestone Deliverable 5, SESAR Master Plan, April 2008, 1–123.
• Strogatz S. H., 2001. Exploring complex networks, Nature (Insight review articles) 410, 268-276.
• Watts D. J. and Strogatz S. H., 1998. Collective dynamics of ‘small-world’ networks, Nature, 393, 440–442.
• White D. R. and Borgatti, S. P., 1994. Betweenness centrality measures for directed graphs, Social Networks 16 (4), 335–346.

## Notes

2. It is out of scope to discuss this here. We cite Nachera et al. (2009) as one of many examples of such work in this field, since we will refer in passing to their work later.
3. Stated to be reasonable because the number of seats in aircraft does not follow a power-law distribution.
4. Stated to be reasonable because at most there will be “about 20 flights per day” and per airline between any two cities and thus the distribution of the number of flights per day between two cities is bounded.
5. Also cited by Lillo et al. (2011).
6. The Pareto distribution is a power-law probability distribution which describes certain observable phenomena. A double Pareto means the function is split into two parts (e.g. and , in this context, above and below a certain degree, ). Double Paretos also describe certain observable phenomena.
7. Primarily crewing, maintenance and passenger connection issues.
8. Whereby, in essence, outbound delays are worse than inbound delays; the opposite case may be described as a delay ‘sink’.
9. Available space here precludes a full literature review on this specific point. Such is available from the authors, on request, however.
10. Available space here precludes a formal presentation discussing adjacency matrices and weights. Such is available from the authors, on request, however.
11. A tree is a connected graph in which any two nodes are connected by exactly one simple path (a path with no repeated vertices), or, more simply, a graph without any loop. A subgraph that reaches out (spans) to all nodes of a graph is called a spanning subgraph. A subgraph that is a tree and that spans to all nodes of the original graph is a spanning tree. Of the set of spanning trees of a weighted, connected graph, the one(s) with minimum total weight is (are) (a) minimum spanning tree(s). If a graph is unweighted, any spanning tree is a minimum spanning tree.