Table of Contents

4. Interpreting Diagnostic Tests

Being able to accurately determine whether an animal does or does not have a disease is fundamental to both clinical practice and research. In the consulting room, this information underpins diagnosis and management decisions, while in research settings it is essential for correctly classifying animals and generating reliable evidence. Any piece of information used to assign disease status can be considered a diagnostic test, from findings in the history and physical examination to laboratory analyses and imaging studies, and the confidence we place in our conclusions depends on understanding how these tests perform and how their results should be interpreted in context.

Clinical Measurements

For our purposes, a diagnostic test is defined as any process or device that can be used to detect disease in an individual, including:

  • medical history questions
  • physical examination findings
  • laboratory test results
  • diagnostic imaging studies
  • response to therapeutic interventions
  • post-mortem examinations

In an ideal world, the tests we choose would have a perfect ability to classify all individuals with the disease as positive and all individuals without the disease as negative on every single occasion. In the real world, however, tests are seldom perfect. This inevitably means that some diseased individuals will test negative (false negatives) while other non-diseased individuals will test positive (false positives). Quantifying how often tests are right or wrong under different circumstances can help us decide whether we can trust the test results.

All of the diagnostic tests we run in clinical practice can be grouped into three major categories based on the type of output data they generate:

BINARY DATA (also called NOMINAL DATA)

These tests produce results that are on a binary yes/no scale.  An example of this would be using rectal palpation to determine the pregnancy status of cows.  A cow will either be pregnant or not pregnant – there is no such thing as being “a little bit” pregnant.

CATEGORICAL DATA (also called ORDINAL DATA)

These tests produce results that are grouped into discrete categories that often indicate increasing level of severity.  An example of this would be the Rapid Mastitis Test (RMT) that subjectively assesses somatic cell count (SCC) levels in milk samples based on the degree of gelling when milk samples are mixed with the reagent and returns results of Negative, Trace, 1+, 2+, and 3+.

NUMERICAL DATA (also called INTERVAL OR CONTINUOUS DATA)

These tests produce results that can take on a range of numerical values that may either be whole numbers or have decimals.  An example of this would be measuring an animal’s heart rate and getting a numerical result measured in beats per minute.

Accuracy and precision

Most tests require us to measure some physical, biochemical or behavioural property of the individual to determine the disease status.

For our diagnostic tests that return numerical values, we also want to consider both the accuracy and precision of the results – in other words, how much do we trust the values that the diagnostic test is producing.

  • ACCURACY: how close the measurements are to the correct value
  • PRECISION: how close a series of measurement are to each other

In an ideal world, all of our diagnostic tests would have high accuracy and high precision so that we could be confident that a single measurement taken on our patients returns a result that is pretty close to the correct value.  If a test has low accuracy and low precision, then we probably shouldn’t be using it as a diagnostic test.

For tests with high accuracy and low precision, we can compensate by taking the average of multiple measurements to get something that is hopefully close to the true value.  A good example of this is blood pressure measurements in cats, which can fluctuate quite a bit based on the stress levels of the cat as well as inaccuracies from the equipment.  We will usually take the average from three to five readings to get our overall estimates of blood pressure.

For tests with low accuracy and high precision, we can often simply adjust the measurement if we know how much it tends to over or underestimate the true value.  A good example would be measuring blood glucose from whole blood samples instead of serum samples.  In whole blood samples, the red blood cells are still metabolically active and will continue to use up glucose from the sample.  We therefore expect blood glucose levels to be lower in whole blood samples that have been sitting around for a while compared with serum samples where the blood glucose levels should be more stable since there are no cells that are consuming the glucose.

Reference Ranges

With tests that report binary data, it’s pretty easy to interpret the test results since they will directly tell us whether the animal has the condition of interest or not.  It becomes a little bit trickier with tests that report numerical data (and to a lesser extent categorical data) because we need to decide what values are considered normal and abnormal.   Let’s say that we are submitting serum from a dog for routine biochemical screening profile.  When we get the lab results back, we will typically see the point estimate for the analyte and what’s called a reference range

In Troy’s lab report below, for example, you can see that the blood glucose measurement (GLU) from his sample is 5.1 with a reference range of 3.9 to 6.1.

Where do reference ranges come from and what do they mean?  Well, typically, researchers will collect samples from an appropriate number of health animals, measure the analyte, and then plot the frequency distribution of these test values.  For most biological variables, these tend to follow normal distributions (bell-shaped curves) with a peak around the average value that tapers off at the tail ends of the distribution.  To get the reference range, we find the lower and upper bounds where we would expect the test results for 95% of clinically normal animals to fall between.

On the flip side, that also means that 5% of clinically normal animals will return a test result that falls outside of the reference range.  We therefore always need to interpret the results with some degree of caution and consider the overall holistic view of the patient’s health as well as other relevant diagnostic parameters.  Sometimes we will also look general trends in the values over time as progressive increases or decreases in a value could indicate a worsening in the animal’s condition.

Testing for infectious diseases

The diagnostic tests we have for infectious diseases are all primarily geared towards either finding the pathogen itself or finding evidence that the animal has mounted a previous immune response against the pathogen.  It is important to have a good understanding of the disease pathogenesis (timeline for how the disease progresses in the animal from the initial infection all the way through the range of possible clinical outcomes) so that we can make sure that we’re conducting the appropriate diagnostic test at the appropriate time in the appropriate animals and therefore making appropriate inferences about the their current epidemiological status based on the results.

Screening for Pathogens

With these diagnostic tests, we are looking directly for the presence of the pathogen itself which would indicate that the animal is currently and actively infected.  These tests can include things like bacterial culture, fungal culture, virus isolation, faecal examination, blood smear, antigen ELISA, and PCR.  One of the big determinants of test sensitivity is the pathogenesis of the disease. For some diseases like bovine tuberculosis, Johne’s disease, or feline leukaemia virus, the pathogen can become latent inside the body and so an infected animal may not actually be shedding the pathogen in large enough quantities to produce a positive result on the day we test it.  Sometimes we might also be testing too soon after exposure for the pathogen to have had the chance to start replicating.

Screening for an Immune Response

These diagnostic tests look for the presence of antibodies against the pathogen and are usually performed on whole blood or serum samples where antibody concentrations are likely to be the highest.  Since it can take anywhere from one to three weeks for animals to develop a measurable immune response to the pathogen, these probably aren’t the best diagnostic tests to run if they have only just potentially been exposed to the pathogen. 

A positive result on test that looks for antibodies generally means one of five things:

Natural infection: The animal has been infected with the pathogen and mounted an immune response against the pathogen.  For diseases that result in transient infections such as bovine viral diarrhoea (BVD), this could indicate that the animal has already likely cleared the pathogen.  For diseases with chronic shedding and carrier states such as Johne’s disease, this could indicate that the animal is likely currently infected with the pathogen.

Previously vaccinated: Many diagnostic tests are unable to distinguish between antibodies that are produced in response to a vaccine versus those that are produced in response to a natural infection.  Therefore, vaccinated animals may test positive even though they have never been exposed to the virus.  You may come across the terms DIVA test or DIVA vaccines which stands for products that are specifically designed to allow Differentiating Infected from Vaccinated Animals.

Maternal antibodies:  When animals ingest colostrum, they can receive a good dose of antibodies depending on dam’s previous exposure.  These maternally-derived antibodies provide young animals with passive immunity until their own immune system can respond more effectively.  It can take weeks to months for these maternal antibodies to degrade and so some young animals will test positive on serological tests even though they have never been exposed.

Cross-reaction: Some antibodies that animals produce can attack different closely related pathogens.  For example, the antibody tests for bovine viral diarrhoea virus (BVD) will also come back positive for animals that were infected with border disease virus (BDV) which is another pestivirus in the same family.  The tests for bovine tuberculosis (Mycobacterium bovis) can also come back positive if cattle can were exposed to Johne’s disease (Mycobacterium avium subp paratuberculosis).

False positive: Sometimes for reasons unknown, there is something else about the sample that could trigger the test to be positive even though the animal doesn’t have any antibodies against the pathogen in their system.

For some disease where the antibodies can stick around for years after the initial infection, these tests also might not be a reliable indicator of the timeline when infection occurred. If we are unsure about our serological diagnosis, there are some clinical circumstances where we would repeat the test in 2 to 4 weeks to look for an increase in the antibody titres that could indicate the animal was recently or is still going through the process of clearing the pathogen.

Relating Pathogenesis to Results

When we’re testing for an infectious disease, it is useful to know the different possible epidemiological states an animal can be in and what that means about their potential diagnostic test results.

A useful framework for thinking about infectious disease pathogenesis is through compartmental diagrams that describe the various mutually-exclusive epidemiological states that an animal can progress through from the time it is initially infected through all the different clinical outcomes.  Some common states included:

  • Susceptible: the animal has not been exposed to the disease before
  • Exposed: the animal has been exposed to the pathogen, but is not yet infectious
  • Infectious: the animal is actively shedding the pathogen and is infectious to others
  • Latent: the animal is infected with the pathogen, but it is hidden/dormant in the body
  • Recovered: the animal has mounted an immune response to clear the infection
  • Maternally Immune: the animal is passively immune to the pathogen from antibodies it got from the dam
  • Vaccinated: the animal is protected against infection through vaccine-derived immunity

Consider the diagram below that outlines the pathogenesis of bovine viral diarrhoea virus.  Susceptible animals that are infected with BVD develop transient infections that last for approximately 2 to 3 weeks before the animals develop a sufficient immune response to clear the virus.  Recovered animals are thought to be protected against re-infection for life.  If a susceptible dam get transiently infected during days 40 to 120 of pregnancy before the fetus has a fully functioning immune system, the calf becomes permanently unable to develop an antibody response to the infection and is born with persistently infected where it sheds large quantities of virus for life.  Calves born to recovered or vaccinated dams may receive BVD antibodies through colostrum which can stick around for up to 10 months depending on the level of passive transfer.  Protection from vaccination against BVD lasts about 6 to 12 months so animals require an annual booster to prevent them from becoming susceptible to the infection again.

This means that an animal testing positive for antibodies could be in the recovered, vaccinated, maternal antibodies, or transiently infected states and so we would probably want to avoid testing young animals and animals who have previously been vaccinated against BVD using tests that screen for antibodies.  Animals that test positive for the virus are either in the transiently or persistently infected states and so we may want to re-test virus positive animals in 2-3 weeks to see if the infection disappears. 

Assessing Test Performance

Whenever we a perform diagnostic test, we need to have a good awareness of how well it works to classify animals into being positive or negative for the disease.  The classic approach for assessing the overall effect of precision, accuracy and cut-off values on test performance is to run the test on a group of individuals with the disease and a group of individuals without the disease to see how the results compare. The true disease status of individuals is determined by a ‘gold standard’ method, which is often a more invasive or more expensive test that we are hoping to replace. For example, the gold standard for diagnosing tuberculosis in cattle is identifying lesions on post-mortem examination. Obviously, it would be ideal if we could find an ante-mortem that provided a high level of performance.

Measures of test performance

The results of the comparison between the test and the gold standard are then presented in a two-by-two table. Note that the rows represent the test status of individuals, while the columns represent the disease status. It is important to keep these tables in a consistent format to avoid confusion.

The prevalence of disease may be calculated by dividing the total number of disease positives (a + c) by the total number of individuals (N). From this information, we can derive four important measures of test performance:

  • Sensitivity (Sn) – the proportion of all truly diseased individuals who are correctly identified as positive on the test. A highly sensitive test has a good chance of detecting disease in affected individuals.
  • Specificity (Sp) – the proportion of all truly non-diseased individuals who are correctly identified as negative on the test. A highly specific test has a low chance of mistakenly classifying an unaffected individual as disease positive.
  • Positive Predictive Value (PPV) – the probability that an individual who tests positive for the disease truly has the disease. If a test with a high PPV comes back positive, there is a good chance that the individual truly has the disease.
  • Negative Predictive Value (NPV) – the probability that an individual who tests negative for the disease is truly disease free. If a test with a high NPV comes back negative, there is a good chance that the individual truly does not have the disease.

If we know the sensitivity and specificity of the test and the prevalence of disease then the PPV and NPV can also be calculated using the following formula (just make sure you use proportions (0.97) rather than percentages (97%) or the numbers will turn out wrong):

Impact of prevalence on test performance

We consider sensitivity and specificity to be inherent properties of the diagnostic test that remain constant regardless of the population being tested. It does not matter what the prevalence of disease is in the population; if a diseased individual is tested, they will always have the same probability of being detected. Similarly, if a healthy individual is tested, they will always have the same probability of being negative.

In contrast, the positive and negative predictive values directly depend on the prevalence of disease in the population being tested. To see how this works, let us consider the implementation of a disease control program that uses a test with a sensitivity of 0.95 and a specificity of 0.90 to identify diseased individuals. In the early stages of the program, the prevalence is 30% and, if we sampled 1000 animals, we would expect to see the two-by-two table shown below where disease prevalence was 30% (300/1000), sensitivity was 95% (285/300) and specificity was 90% (630/700).

This gives us a PPV of 285/355, which means that 80% of animals that test positive for the disease truly have the disease. The NPV of 630/645 means that 98% of animals that test negative for the disease are truly disease free.

If the prevalence subsequently dropped to 3% as a result of program success and we again sampled 1,000 animals, we would see the PPV decrease to 27/124 (23%) and the negative predictive increase to 873/876 (99%). Although nothing has changed with the test sensitivity and specificity, the difference in prevalence has caused the absolute number of true positives to decrease and the absolute number of false positives to increase.

As another way of thinking about it, let us imagine that two cats present to your clinic for a feline immunodeficiency virus (FIV) test. This disease is predominantly spread through saliva in bite wounds. The first cat is an outdoor cat with lots of scars from previous fights, while the second cat is an indoor only cat that has never been in contact with other animals. If both cats test positive, which result are you most likely to believe? Most likely the first cat because it originates from a high risk (high prevalence) population. The second cat is more likely to be a false positive because it originates from a low risk (low prevalence) population. This highlights the importance of always interpreting diagnostic test results alongside other historical and clinical information about the patient.

We can also summarise the relationship between predictive values, sensitivity, specificity and prevalence as:

  • The more sensitive a test, the better its negative predictive value (that is, if a highly sensitive test comes back negative, then we can probably rule OUT disease)
    • For example: Virtually all cats with diabetes have elevated blood glucose levels. If we test a cat and its blood glucose levels come back normal, then it is highly unlikely to have diabetes. The test, however, is not very specific. There are a number of reasons why cats can have elevated blood glucose and so if the test comes back positive, we need to perform additional testing to confirm diabetes.
  • The more specific a test, the better its positive predictive value (that is, if a highly specific test comes back positive, then we can probably rule IN disease)
    • For example: Demodecosis is a skin infection of dogs and cats caused by mites that live in hair follicles. If we identify a demodex mite by examining a skin scrape under the microscope, then we can be almost certain that the patient has demodecosis. The test, however, is not very sensitive. Mites are notoriously difficult to find and so if the test comes back negative, we cannot rule out demodecosis.
  • If the prevalence increases, positive predictive value increases and negative predictive value decreases.
    • For example: If we perform an ACTH stimulation test on a dog that has classic clinical signs of Cushing’s disease (weight gain, increased thirst and urination, and symmetrical alopecia), we are more likely to believe a positive test result than a negative test result.
  • If the prevalence decreases, positive predictive value decreases and negative predictive increases.
    • For example: If we perform an intradermal tuberculin skin test on a cow from a low risk region for bovine tuberculosis in New Zealand, we are more likely to believe a negative test result than a positive test result. When a disease is rare (with a prevalence generally less than 0.01), the specificity of a test is rarely high enough to give an adequate positive predictive value.

Factors Influencing Test Performance

There are actually quite a few steps involved in conducting diagnostic tests that can each influence our ability to find animals that truly have the disease. 

  • Diseased animals must have or be displaying the thing we are looking for on the test date. This is not always the case for diseases with clinical signs that wax/wane or infectious diseases where pathogens are only shed intermittently or when an animal may not have been diseased long enough to develop detectable signs.
  • We have to be able to collect the right samples from the right location. There are often a range of possible samples we can test (i.e. blood, serum, tissue, milk, urine) and some tests work better on some samples than others.  Sometimes animals may not tolerate having a test performed so we might not be able to get as good results as we hoped for.
  • The samples have to be handled and processed correctly before we can run the diagnostic test on them. Samples can degrade over time if they are not stored correctly (i.e. correct containers or correct temperatures) or there are delays in being able to ship them to a diagnostic laboratory.  This is often more of an issue when you are conducting sampling out on farms.
  • The diagnostic test methodology itself has to work correctly to generate a result. Very diagnostic tests are 100% perfect even when they are run absolutely correctly so there is chance they may give us false positive or false negative results.
  • We have to interpret the test result correctly. We the test does generate a value or result, we have to interpret it correctly to classify the animal as being positive or negative for the disease.  As we’ll discuss further below, we can influence the sensitivity and specificity of some tests be changing our criteria (cut-off value) for where we draw the line between positive and negative results.

Say, for example, that we want to test a cow for Staphylococcus aureus mastitis using a bacterial culture conducted on a milk sample. 

It’s one of the harder diseases to diagnose because infected animals only shed the pathogen intermittently into the milk so there is a chance there may not be many bacteria in the milk on the day you collect the sample.  The sample has to be plated on an appropriate culture medium and incubated at the correct temperature for a day or so to create appropriate conditions for the bacteria to grow.  It’s sometimes a difficult bacteria to grow so we may not get any colonies even if there are bacteria present in the sample and we have provided correct conditions.  

The overall sensitivity is only around 60%, but the specificity is almost 100% because no other pathogens look quite like it on bacterial culture.

Understanding Cut-off values

Setting cut-off is one aspect of diagnostic test performance that we can adjust after the results have already been obtained. 

To understand how this works, let’s consider the case of “Charlie”, our chubby cat friend on the right who is a 6 year old neutered male DSH presenting to the clinic because the owner noticed he was drinking and peeing a lot more than usual over the past few weeks.  One of the differential diagnoses that should immediately jump to the top of your list is diabetes mellitus since we know that obesity is a major risk factor for cats developing the disease.  We also know the clinical signs of diabetes mellitus are related to animals having too much glucose in their blood because they either are not producing enough insulin (Type I diabetes) or the insulin receptors are not working correctly to push glucose into cells (Type II diabetes).

The main test we have to diagnose diabetes mellitus is measuring blood glucose levels, which can easily be done patient-side with handheld glucometers.  After discussing our concerns with the owner, we manage to collect a blood sample from a very stressed out Charlie and the results come back as 11 mmol/l.  How confident are we now in diagnosing Charlie as being diabetic?

The answer is that we look at results from research study where they took blood samples from cats were known to be disease negative and cats that were known to be disease positive to measure their blood glucose levels.  We then look at plots of the frequency distributions for the groups to see what the range of values look like.

In an ideal world, there would always be a complete separation in values between the two groups and we would be able to set a very clear cut-off point to classify animals as being positive or negative based on the test value.  As you can see in the diagram below, any animal with a test value that is above the cut-off point would be accurately classified as disease positive and any animal with a test result that is below the cut-off point would be accurately classified as disease negative.

The real world is unfortunately a little bit messier than that and there is often some degree of overlap between the two groups as you can see in the next diagram below.  If we were to set the cut-off point in the middle now, there will be a small number of disease positive animals that will be classified as negative because they have test values lower than the cut-off point (false negative) and a small number of disease negative animals that will be classified as positive because they have test values above the cut-off point (false positive).

Depending on our objectives for performing the test, we might decide to shift the cut-off point away from the middle.  If we make the cut-off point a lower value (i.e. shift it to the left), you can see that we will catch more disease positive animals (increased sensitivity) but at the expense of having a lot more false positive animals (decreased specificity).   If make the cut-off point a higher value (i.e. shift it to the right), you can see that we will miss more disease positive animals (decreased sensitivity) but we will have much fewer false positives on the test (increased specificity).  There will almost always be some trade-off between sensitivity and specificity.

As you can imagine, we could potentially choose to set the cut-off value at any point along the x-axis from 0 (all the way to the left) up to infinity (all the way to the right).  So how do we choose the optimal cut-off point?  Well, it depends on what we are trying to achieve.

Receiver Operating Characteristic (ROC) Curves

To find the cut-off point that optimises the trade-off between sensitivity and specificity, researchers will make what’s called a Receiver Operating Characteristic (ROC) Curve that plots the true positive rate on the y-axis against the false positive rate on the x-axis across the entire range of different possible cut-off points.  An absolutely perfect diagnostic test would produce a single point in the top left corner, which means that it produces 100% true positives and 0% false positives no matter where you set the cut-off point.  A completely useless diagnostic test would produce a straight line with a slope of 1, which means that no matter where you set the cut-off value there is always a 50:50 chance that an animal with a positive test result is truly positive.  You may as well flip a coin with those odds rather than running your diagnostic test! 

A real-world diagnostic test would produce something more like the blue curve where each location on the line corresponding to one of the different possible cut-off points.  We trace along the blue line until we find the point that is closest to the top left corner (i.e. closest to perfect) and use whatever cut-off point generated that point on the curve as our threshold for calling animals positive or negative.

Another measurement you will often hear about is the Area Under the ROC Curve (AUC) which is a rough estimate of how well the test performs in classifying the animal into the correct disease state.   A perfect test would have an AUC of 100% while flipping a coin would produce an AUC of 50%.  We generally want to see AUC values greater than 70% in order for the diagnostic test to have some practical value.

Improving Test Performance

One way to improve either the sensitivity or specificity of our diagnostic process is to use a combination of tests. When multiple tests are used, we can choose to interpret the results in series or in parallel.  When using a series interpretation, an individual is declared positive if all tests return a positive result. Series interpretation maximizes specificity at the cost of sensitivity. To illustrate, consider a scenario in which we have two independent tests that when used on their own both have a sensitivity of 90% and a specificity of 80%. If we only consider an individual to be diseased if both tests are positive, then 0.9 × 0.9 = 0.81 of all patients with disease would test positive on both, making the sensitivity of the combination 81%, which is less than either test on its own. However, the specificity would improve because those who are negative for the combination would include all of those who tested negative on either test: 80% of those without disease would test negative on the first test and, of the 20% who did not, 80% would test negative on the second test, making the specificity of the combination 0.8 + (0.2 X 0.8) = 96%.

When interpreting the test results in parallel, an individual is declared positive if at least one of the multiple tests returns a positive result. Interpreting test results in parallel increases the sensitivity and decreases the specificity. To understand this, let us again consider two tests that have a sensitivity of 90% and a specificity of 80%. When interpreting the tests in parallel, 90% of those with disease would test positive for the first test and, of the remaining 10%, 90% would test positive on the second test, making the sensitivity 0.9 + (0.1 × 0.9) = 99%. This substantial increase in sensitivity has come at the cost of specificity because an individual will only be considered disease free if both tests are negative. The probability of both tests being negative is 0.8 × 0.8 = 64%.

The underlying assumption in this section was that the tests are independent. Obviously, this is rarely the case, and we would need to use a more complex approach to determine changes in sensitivity and specificity, which cannot be calculated as easily. However, the same principles will apply; that is, combinations of tests can be used to increase either the sensitivity or specificity at the cost of the other, depending on how disease is defined.

Back to the example of our diabetic cat, we could try re-checking glucose measurements in a few days (testing in series).  One of my favourite cheap ways to do this with stressy cats was to have the owners get a free catch urine sample from non-absorbable litter at home a few days later and then check a urine dipstick for glucose.  As you will learn in pathology, the blood glucose levels in diabetic cats are above the renal threshold for re-absorption and so these guys will virtually always have glucose present in their urine whereas we would expect a stress hyperglycaemia cat to be negative once they’ve settled down at home. 

Another slightly more expensive option for testing in parallel would be to check a serum fructosamine level, which indicates whether glucose levels have been elevated for much longer periods of time rather than just the transient stress hyperglycaemia.  For patients that go on insulin therapy long-term, this is also a good monitoring test to make sure that the blood glucose levels are being adequately controlled on the current treatment regimen.

Screening versus Confirmation Tests

For many of the diseases on our differential diagnosis list, there are often several tests we can choose between to help determine whether our patient is positive or negative for the disease.  These tests may vary quite widely in their performance, cost, availability, turnaround time, technical skill required to perform the test, and risk of causing harm to the patient.  It does take some clinical judgement to make decisions about which tests we should use for initial screening to maximise our chances of finding all potentially positive animals and follow-up confirmation to further distinguish between true positives and false positives.

Screening tests

For screening tests, we usually want to choose something with high sensitivity that is also quick, cheap, and safe to perform so that we can easily rule out diseases on our differential diagnosis list.  Although it may seem slightly counter-intuitive, if an animal comes back negative on a test with a high sensitivity, then we are pretty confident that it does not have the disease.

A good example of this is using blood glucose measurements to screen cats for diabetes.  If the blood glucose comes back in the normal range (i.e. test negative), then we are pretty confident that our cat is not diabetic.  If it comes back positive, then we might want to consider running a confirmation test since cats can get a mild to moderate hyperglycaemia simply from the stress of being at the vet clinic and having their blood taken.

Confirmation Tests

Once we have a suspicion that an animal may have the disease based on our screening test and/or compatible clinical signs, there are a couple different options we can take to confirm our diagnosis.  We could repeat the screening test again at a later time point to see if it stays positive (running tests in series)or we could run different tests to see if they also give us positive results (running tests in parallel). For confirmation tests, we want something with high specificity to help us rule in differentials on our list. It may again seems counter-intuitive, but a positive result on a highly specific test means that the animal is very likely to have the disease.

The principles of multiple testing are used when developing a screening program. In a screening program, individuals who otherwise appear healthy are screened for disease to determine whether it is present in the population. Ideally, the test should be easy to administer and low in cost. It also should be a highly sensitive test so that it misses only a small number of diseased animals. Its specificity should still be reasonable, so that the number of false positives subjected to the confirmatory test remains economically justifiable. Individuals that return a negative result to the screening test are regarded as true negatives and not given any further test. Individuals positive to the screening test are subjected to a confirmatory test. The confirmatory test can require more technical expertise, more sophisticated equipment, and be more expensive because it is only applied to a reduced number of samples. However, it also has to be highly specific, so that any positive reaction to the confirmatory test is considered a definitive positive.

An example of a screening program is the current tuberculosis one in New Zealand. At the moment, all cattle and deer are regularly tested for tuberculosis using a skin test. The skin fold test is a cheap test that has a relatively high sensitivity but a low specificity. Consequently, nearly all animals infected with tuberculosis will be detected; however, the test positives will include a large number of animals that do not have disease. Consequently, follow up testing is required to determine the TB status of any animal that reacts.

In clinical practice, we often use the same principles in diagnosing disease in individual patients. For example, you may identify low thyroid levels in a geriatric dog that had routine annual screening blood work and then perform additional diagnostic testing to confirm the presence of hypothyroidism.

When to perform diagnostic tests

After going through this review, you may wondering what’s the point of performing diagnostic tests.  It’s always an interesting question about when we should be performing them in practice.  Just because we have diagnostic tests available doesn’t always mean that we can or should be running them for every case.  My general rule of thumb is that you should only run a test when the results have the potential to change the course of action for your clinical recommendations. 

Previous

3. Exploring Relationships

Next

5. Designing Research Studies