Another key assumption of compartmental models is that populations are homogeneously mixed, meaning every individual has an equal chance of contacting any other individual and all animals have the same average contact rate. In many modern production systems this is unrealistic. Animals are commonly segregated into management groups by age and physiological status, and herds are geographically separated, which strongly shapes who contacts whom and how infection can spread. In this module, we expand traditional compartmental models by introducing network-based approaches and spatial transmission kernels that better represent structured contact patterns and demographic organisation at the between-herd and industry level.
By the end of this module, participants should be able to:
Simple compartmental models have been used for many decades to represent infectious disease dynamics in populations. One of the fundamental limitations with them is the assumptions they make around the contact structure of populations (homogenous mixing assumptions). Specifically, (1) that all individuals in a population have an equal chance of coming into with any other individual, (2) that all individuals have the same average contact rate, and (3) often that the probability of transmission is the same for all kinds of contact. Furthermore, demographic events such as births, deaths, and transitions between management stages are typically modelled as occurring at a continuous rate over time.
These are probably okay assumptions for modelling disease dynamics in small, well-mixed populations with year round production (such as swine breeding herds or commercial poultry flocks) or for modelling within-herd transmission dynamics for diseases (such as foot-and-mouth disease) that are so highly infectious that all animals on an affected farm are for all purposes at equal risk of getting exposed. However, when you are modelling disease dynamics at an industry level where each farm is its own discrete epidemiological unit and the contact patterns between farms are highly heterogeneous. We call this type of contact structure a metapopulation or a population of subpopulations. Herds that segregate animals into discrete management groups with different within- and between-group contact rates can also be described as metapopulations.
For example, we know intuitively that a farm from Northland is much more likely to trade animals with a farm from Waikato than it is trade animals with a farm in Canterbury. And we also know from cross-sectional contact surveys and analysing national cattle movement data that a very small number of herds are responsible for a disproportionately large number of the total movements between herds in the industry. This clearly violates the assumptions around homogenous mixing and we need to use different approaches for modelling disease transmission dynamics – time to think outside the proverbial box!
There are two common approaches to modelling the spread of disease between individual herds: (1) modelling the discrete movements of animals between locations for trade, grazing, or other purposes and (2) modelling a continuous diffusion process where the risk of another farm getting infected decreases as some function of distance from an infected farm.
Network analysis has emerged as a powerful framework for understanding how the connections between individuals in a population influence the spread of disease. The field has its origins in the famous six-degrees of separation experiment by psychologist Stanley Milgram in 1967. In this experiment, letters were given to selected people in Kansas and Nebraska (mid-western United States) with the instructions to mail it to a personal acquaintance who they thought would be most likely to directly know a target person from Boston (eastern United States).
The letters were forwarded along in chain with each recipient recording his or her name on the list until they reached their final destination. It took an average of only six contacts per journey, which strengthened the hypothesis that we live in a highly connected world. Since then, there has been tremendous interest in studying these contact networks across a wide range of biological systems. There is now even a parlour game called “Six-degrees of Kevin Bacon” to trace connections with the global film industry.
The first applications of network analysis to infectious disease epidemiology were in the early 1990s with studies of HIV transmission through sexual contact networks and IV drug user networks. It was found that a small number of individuals with a large number of connections were likely responsible for most transmission events. The field expanded to veterinary epidemiology in the early 2000s when data from national cattle identification and tracing systems in the European Union became available. For the first time, this provided a virtually complete picture of all cattle farms in a single country and the all animal movements between them, which has greatly advanced our understanding of disease epidemiology. These tracing systems have now been implemented in many countries and for many different animal species including sheep, swine, fish, and poultry. Network analysis is now even being used to study disease transmission in plant and wildlife communities.
There are two fundamental building blocks in any contact network:
Each node can also have its own set of attributes that describe important epidemiological features such as:
When building a network dataset, each node in the network should have a unique ID number and the information about the nodes should be stored in a separate spreadsheet.
If the relationship has a clear directionality such as the movement of animals FROM one farm TO another farm, the edge is said to be directed. If the relationship is bidirectional such as nose-to-nose contact over a fence, the edge is said to be undirected. Each edge can also have its own set of attributes that describe important epidemiological features such as:
These attributes can be used to assign a weight to the edge that describes the strength of the relationship between the two nodes.
Information about edges is also usually stored in a spreadsheet. The ID numbers for the origin and destination nodes for the contacts should be coded the same as in your node worksheet and placed in the first two columns.
Most network analysis programmes will let you specify if the network is directed or undirected. If not, make sure that for an undirected network if there is an edge been node 1 and node 5, it is coded in the spreadsheet as an edge from 1 to 5 and an edge from 5 to 1.
The other common method for storing edge data is as a matrix with the rows representing the origin node IDs and the columns representing the destination node IDs. The squares in the matrix are filled in with ones if there is an edge between the two nodes and zeros if there is no edge. If the matrix is completely symmetrical, then you know the network is undirected.
If there is more than one type of node and/or edge in the network (i.e. movement contacts and fenceline contacts), the network is said to be multi-modal. The analyses on these types of networks are slightly more complicated because each node and/or edge type carries a different epidemiological risk.
Networks are then typically drawn with circles representing the nodes and arrows representing the relationships between them. The nodes can be plotted by their spatial location (if available) or arranged strategically to make the patterns easier to see.
Draw a graphical representation of the networks for each of the following edge lists.
It is also important to distinguish between static networks where the nodes and relationships remain fixed over a time period and dynamic networks where the patterns constantly change over time. An example of a fairly static network would be dairy farms connected through the movements of a milk truck, which visits each premises along a fixed route each day. Cattle movement networks on the other hand are quite dynamic with most connections occurring only once in time and never being repeated.
As with any data type in epidemiology, there are certain descriptive statistics we calculate to characterize the behaviour of networks. First and foremost is simply the total number of nodes and edges in the network (stratified by type in the case of multimodal networks). The remaining measures are mostly geared towards identifying individual nodes and edges that are highly connected (central) in the network.
The more connections into a node, the more likely it is to acquire disease and the more connections out of a node, the more likely it is to spread disease. Most biological networks are described as being scale-free meaning the majority of nodes have relatively few contacts while a very small number are highly connected and acting as hubs.
Betweenness centrality can also apply to edges in the network except this time measuring the number of times an edge falls on the shortest path between any two nodes in the network. From a disease surveillance and control perspective, we are particularly interested in identifying those highly connected individuals and edges because they are likely responsible for most disease transmission events.
Calculate the in-degree and out-degree for each of the nodes in the following network. How many nodes and edges are there total? Which node is most likely to acquire disease? Which node is most likely to spread disease? How many edges are reciprocal? Is there any clustering in the network?
| Node | In-degree | Out-degree |
|---|---|---|
| A | 0 | 1 |
| B | 0 | 1 |
| C | 0 | 1 |
| D | 2 | 1 |
| E | 2 | 3 |
| F | 5 | 4 |
| G | 2 | 1 |
| H | 1 | 0 |
| I | 1 | 1 |
| J | 0 | 1 |
| K | 2 | 0 |
| L | 1 | 1 |
| M | 1 | 0 |
There are a total of 13 nodes and 12 edges. F is the node most likely to acquire and spread disease. 4 edges are reciprocal. Nodes G, F, and K are clustered.
When you put the basic network building blocks together, you get some quite interesting network patterns (also called network topology). These are very different than the traditional “compartmental” modelling approaches to representing population contact structures where every individual is assumed to have the same number of contacts and the same probability of making contact with any other person in the network. In the example below of a cattle contact network, you can see evidence of that skewed degree distribution with the small number of darker nodes acting as hubs and the majority of lighter nodes contributing very few contacts. Can you identify any particular nodes or edges that have a central role in this network?
There have been a lot of descriptive metrics developed to help characterize these broader patterns and a lot of research studies to figure out what this means for infectious disease transmission. Most metrics are derived from the mathematical field of graph theory, which is why the names and interpretations are sometimes challenging to make sense of in a biological setting.
To calculate density, you simply divide the total number of cells with the value of 1 by the total number of cells in the matrix (i.e. number of directed edges divided by (number of nodes)2 ). In the example above, the density is 7 / 25 or 28%. In most livestock movement networks, the densities are typically less than 0.01%.
To calculate fragmentation, you go through each node in the network and see if you can trace along a sequence of paths to connect it to every other node in the network. The numerator is the total number of pairs that cannot be connected and the denominator again is the (number of nodes)2. Most livestock movements networks are about 90+%.
Intuitively, diseases can spread more readily through networks that are dense and well connected than networks that are fragmented.
When disease is introduced to a highly clustered network, it tends to rapidly circulate around the clustered individuals and take longer to circulate to the more disconnected parts of the network.
Epidemics on assortative networks have faster initial growth rates and shorter durations because the high degree nodes tend to get infected very quickly and there are no more susceptible nodes to continue spreading the disease. Livestock networks are generally mildly disassortative.
In theoretical models of epidemic spread, average path length is positively correlated with the time taken to reach maximum epidemic size. The ‘closer’ nodes are in network distance, the faster disease can spread between them. Many real world biological and social networks are structured to minimize the effort needed to connect any two nodes Watts and Strogatz describe this as the small-world phenomenon, where network topology is characterized by local clustering of contacts with occasional long distance jumps bridging isolated network components. Highly connected nodes or hubs are instrumental in decreasing average path length.
By definition, an epidemic seeded in GSCC has the potential of spreading to all other nodes in the GSCC and has therefore been widely used to estimate the lower bounds of an epidemic.
There are many different algorithms that have been used to identify these connected network components. From an epidemiology perspective, we would expect disease to circulate more readily within a community, especially if there has been a new disease introduction. The example below shows trade communities in the German pork production network. One of the key insights here is that these communities often don’t align will with the traditional geographic or administrative boundaries in a country, which are so often used to define disease control regions.
One of the big limitations in network analysis is that most of the individual node and topology measures are designed for static networks where the links are fixed and continuous over a given time period. As previously mentioned, the edges in most cattle and sheep contact networks are dynamic with most only occurring once in time and never being repeated again. When you actually start putting dates against the edges, the network actually becomes far less connected than it appears. This is slightly less of an issue for continuous flow production systems like swine and poultry where the contacts are slightly more fixed and regular.
Another limitation is that a lot of these measures don’t allow you to weigh the nodes and edges by all the other important epidemiological risk factors that determine whether disease can spread. For example, some cattle contact networks used to evaluate the spread of contagious mastitis pathogens include the movements of bulls and calves, which obviously cannot spread the disease. For these reasons, the gold standard for evaluating the impact of network structure on disease dynamics has become simulation modelling, which we will discuss later in this series.
In order to build contact networks, you also need to have good contact data. We’re lucky in the cattle world to have very good individual animal information because of national traceability requirements, but it is much more difficult for other systems. In these cases, the data often has to be collected through contact surveys, which are subject to the usual issues of non-response, poor recall, and limited timeframes of data. Since most of these networks are dynamic, the specific contact patterns obtained at the time of your cross-sectional survey will not be valid in the future although the broad network topology features tend to remain the same. Obviously, if your network is missing a lot of data, it is very difficult to calculate any of those other network measures.
There is unfortunately not enough time in this short-course to cover all of the methods for estimating spatial transmission kernels, but it is basically just an equation that describes the changes in the risk of infection (another β) as a function of the distance from an infected farm. In order to use the equation, we obviously need to know which susceptible farms are located near an infected farm and the distance between them so we estimate the infection pressure. Probably the easiest way to do this in R is to create a distance matrix using tools in the spatstat package – this requires you to input a dataframe containing the lat and long coordinates for each node and it will quickly generate a distance matrix. Just make sure your distance units match whatever equation you are using.
To model the spread of disease through contact networks, we start by creating a matrix where each row represents an individual farm and the columns represent the number of animals in each disease state. On each day of the simulation, we can update the disease status for each farm individually using the differential equations like we did before. We then typically take our list of contacts (origin, destination, date) and insert the movements as discrete events into the simulation code. Basically, if movements occur on the date in simulation, check the disease status of the origin farm. If the origin farm is infected, check the disease status of the destination farm. If the destination farm is susceptible, then assign a probability of transmission based on the prevalence of disease in the origin herd and the weight of the contact (i.e. number of animals moved). Update the disease status of the destination farm and continue onto the next date.