Table of Contents

6. Identifying Error and Bias

We conduct epidemiological studies to identify exposures or risk factors that are associated with a particular disease outcome. Although we hope our study results reflect the true underlying disease processes, there are, unfortunately, many other reasons why we can find significant associations where none actually exist or fail to find significant associations where they actually do exist. Broadly, these reasons can be divided into two categories:

  • Random error – findings that differ from the truth purely by random chance

AND

  • Bias – findings that differ from the truth in a systematic and somewhat predictable direction, including:
    • Selection bias – arising from issues with how subjects were selected or recruited for the study
    • Information bias – arising from issues with assigning subjects into the correct exposure and disease categories
    • Confounding bias– when the exposure variable is serving a proxy for some other risk factor that is actually causing the disease outcome.

It is important that you learn to recognize when these errors or biases may have occurred and to predict what effects they may have had on the study findings. If we determine that study findings were not due to any of these sources of error, then the results are considered internally valid. In other words, the conclusions reached are likely to be correct for the circumstances of the particular study. That does not necessarily mean that the findings can be generalized to other circumstances or other populations (external validity). There are other criteria that must be fulfilled to determine if a study is externally valid.

Random Error

Random error is defined as the unpredictable variation in study results arising from factors that are largely beyond our ability to control. Imagine, for example, that we want to estimate the true prevalence of lameness in New Zealand dairy cattle. We randomly select 1,000 animals from a list of all dairy cattle in New Zealand and find that 12% are lame on physical examination. Had we randomly selected a different sample of 1,000 animals from New Zealand, we might just have easily found that the prevalence was 20%. There was nothing inherently wrong with how we selected animals for each sample or with how we measured lameness – it was simply by random chance that the second sample contained more lame cattle.

The challenge with random error is that since we do not know what the truth is, we cannot easily determine whether we have over- or under-estimated it in our study. We can reduce random error by ensuring that we have selected an appropriate sample size or by taking a larger sample from the population. In the literature, you will usually see 95% confidence intervals reported for estimates of disease prevalence and measures of association. What that means is that we would expect the true value for the population to fall within the lower and upper bounds of the calculated intervals 95% of the time.

In general, the narrower the confidence interval, the more certainty we have in our point estimates. Whenever you see a p-value reported for statistical tests, it is telling you the probability that the results could have occurred simply by random chance. By convention, the cut-off value for significance is set at 0.05 (that is, we are willing to accept a 5% probability that the results occurred by chance). However, the lower the p-value, the more likely the study results represent a true effect.

Bias

Bias is defined as systematic error that causes the observed study findings to differ from the truth in a relatively predictable direction. In the context of epidemiology, it does not imply that the researchers were intentionally prejudiced or intentionally influencing the results to obtain a desired outcome. Bias is primarily caused by unintentional issues with recruiting subjects into a study (selection bias), classifying the exposure and/or disease status of subjects (information bias) and/or choosing exposure variables that are proxies for other risk factors that are actually causing disease (confounding bias). Bias can also sometimes occur when the duration of follow-up for the study is too short. For example, bovine tuberculosis has a long latency period. Consequently, if you conducted a one-year study to determine if exposure to wildlife increased the risk of bovine tuberculosis, you might get a relative risk close to 1 simply because the animals did not have enough time to develop the disease. A key difference between random error and bias is that we cannot reduce bias by simply increasing the study sample size.

Bias can directly cause our estimates of disease frequency to be higher (positive effect) or lower (negative effect) than the true values. Returning to the previous example of lameness in New Zealand dairy cattle, let us say that the researcher was inexperienced and not very good at detecting lame animals. Due to this information bias, we might expect our estimates of prevalence to be lower than the true value. If we can quantify how often the researcher missed lame cattle, then we can adjust our estimates of prevalence to more accurately reflect the true value.

The effects of bias on our measures of association depend on which group of individuals (that is, which cell in our two-by-two table) is over- or under-represented. In practice, it is very difficult to get precise numerical estimates for the effects of bias and so we usually say instead that bias has skewed the results either:

  • Towards the null: makes the observed measure of association closer to 1 than the true value. Therefore, the association appears weaker than it should.

                            OR

  • Away from the null: makes the observed measure of association farther from 1 than the true value. Therefore, the association appears stronger than it should.

If we were looking at the presence of visible hoof wall abnormalities as a risk factor for lameness in dairy cattle, there is a potential for information bias since the researcher may examine cows with abnormal hooves more carefully for lameness. This may lead to an over-representation of exposure positive and disease positive animals, which would make the association appear stronger than it should (bias away from the null). If we were running this as a cohort study to determine if the presence of hoof wall abnormalities was a risk factor for developing lameness over time, we could potentially see bias towards the null if farmers were more likely to cull animals with hoof abnormalities before they had a chance to develop lameness (under-representation of exposure positive and disease positive animals). This is a form of selection bias, which we will cover in the next section.

As you have probably already guessed, there is a lot of grey area in deciding whether study findings may have been biased. This is particularly the case when you have only limited information on the study methods and when the researchers have not provided sufficiently detailed discussion on the weaknesses in their approach. It is up to you as the reader to think about the potential sources of bias and determine whether they have been adequately addressed. After reading a paper, you may ultimately choose to wait for stronger evidence before modifying your clinical practices based on the study recommendations.

Selection Bias

To produce accurate estimates of disease, we would ideally be able to obtain information on every individual within the target population. The target population is the group of individuals we are interested in making inferences about and may be something as large as all commercial dairy cattle in the world or as small as Farmer A’s replacement heifers group. In the real world, it is usually too expensive and too impractical to conduct a census, that is collect information on all individuals in the target population. So we must settle with drawing inferences from a subset of these individuals. The source population is defined as the members of the target population that had a chance of being selected for inclusion in the study, while the study sample is the final group of animals that were analysed in the study.

There are several stages in the process of subject selection where we can lose potential subjects from the study: (1) deciding who we choose to sample from the target population; (2) finding individuals within that source population who are eligible and willing to participate in the study; and then (3) obtaining complete and accurate data for all subjects who were initially enrolled. If the subjects that were ultimately excluded from the final analysis are more likely to be from one particular exposure and disease group, then the study findings may be biased.

Selection bias is defined as systematic error that arises from the method used to select subjects for the study and from the factors that influence study participation. It occurs when the association between exposure and disease in study participants differs from those who did not participate. Given that we typically have no information about individuals that do not participate, the presence of selection bias is usually inferred rather than observed.

In choosing in our study sample, we hope to end up with the same proportion of subjects in each exposure and disease category as the target population. In other words, a scaled-down version of the true two-by-two table will allow us to accurately determine the association between exposure and outcome.

One of the most common types of selection bias in is non-response bias (also known as volunteer bias or self-selection bias) when the answers of participants differ from the potential answers of individuals who chose not to participate. For example, sending out an electronic survey invitation asking owners about their opinions regarding homeopathic medicine. People with strong opinions or who have ill pets are more likely to respond.  In longitudinal studies, attrition bias can occur when individuals drop out of a study.  For example, amongst dogs enrolled in a diet trial for weight loss, the subjects most likely to drop out are those for whom the diet is not working.

Information Bias

As discussed in previous sections, it is crucial that we make the correct diagnosis of disease by combining information on history, clinical signs, physical examination and diagnostic test results to distinguish those with disease from those without. We can run into problems when that information is inaccurate. This can occur because owners have difficulty remembering past exposures, the equipment we are using may be broken and the diagnostic tests we choose are seldom perfect. Research studies are subject to the exact same errors, which can lead to a particular type of bias called information bias.

Information bias is defined as systematic error that arises from inaccuracy in measuring the exposure status or outcome for the study participants. Data may be measured inaccurately because of:

  • Subject error;
  • Instrument error; or
  • Observer error.

For example, in a study evaluating the effects of hypertension on cats with hyperthyroidism, our blood pressure measurements may be falsely elevated because the animal was anxious at the time of measurement (subject error), the clinic’s Doppler machine may not have been calibrated correctly (instrument error), or the person taking the blood pressure readings may have hearing impairments (observer error).

When a subject is assigned into the wrong exposure or disease category, we call this misclassification bias. If diseased and non-diseased individuals are equally likely to have been misclassified into exposure groups (or if exposed and non-exposed individuals are equally likely to have been misclassified into disease groups), the information bias is said to be non-differential. Non-differential bias occurs in almost every epidemiological study.

To illustrate, imagine a cohort study investigating the relationship between administering vitamin B12 injections to newborn piglets and the development of pre-weaning scours. It is likely that some piglets in the treatment litters may have been missed or that piglets could have been transferred between litters for other management reasons. This may result in misclassification of exposure. Since we have no reason to believe that scouring piglets were more or less likely to be misclassified, the bias is non-differential and results in a more predictable bias. When the variable is dichotomous (for example, male/female or diseased/non-diseased), the resulting measure of association will always be biased towards the null (closer to 1).

Differential bias on the other hand occurs when a subject’s disease status affects their likelihood of being misclassified into an exposure category and vice versa. A common type of differential bias in case-control studies is recall bias, which can occur because subjects are asked about past exposures after disease has already been diagnosed. A classic example is case-control studies looking at risk factors for birth defects. It is argued that mothers who give birth to children with birth defects are able to recall exposures during early pregnancy with greater accuracy than mothers with healthy children. If we calculated the odds ratio here, it would be much higher than normal because that particular cell in our 2×2 table is overrepresented in the study sample compared with the target population.

It should be noted that we are talking about a difference in recall between outcome groups and NOT the more general problem of having people accurately report their exposure and disease status. The more general problem affects all groups to the same degree and will likely result in non-differential misclassification.

In a cohort study, there can be differential bias in disease measurement if the follow up and recording of disease status in exposed and unexposed study participants differs. To illustrate, suppose we have a study looking at the use of dry cow therapy to prevent mastitis. Differential bias may be present if the investigator conducted a detailed exam of all cattle that had dry cow therapy to determine if they had subclinical mastitis, but did not do the same for cattle that did not receive the treatment. The result is that the diagnosis of mastitis may be higher in the animals that received dry cow therapy, causing a bias away from the null.

Confounding Bias

In an ideal world, if we were running an observational research study to look at the effects of a particular exposure on a disease outcome (i.e. the effects of feeding a particular diet on obesity in mice), we would want the case and control populations to be identical in every single way except for the diet (i.e. genetically identical mice with the same age and sex housed in the exact same environment and provided with the exact same husbandry for the duration of the experiment). That way if we find any difference in the risk of obesity, we are virtually certain that the entire effect can be attributed to the diet rather than some other underlying difference between the two study populations.

In the real-world, it gets a lot messier because it is highly unlikely that we can find case and control populations that are exactly identical. There are probably going to be differences in the distribution of ages, breeds, sexes, diets, exercise regimes, and other lifestyle factors between individuals in each group. Let us say that we are interested in knowing whether coffee consumption (exposure) increases the risk of heart attacks (outcome) in university lecturers. The hypothesis makes plausible biological sense because we know that the caffeine in coffee has many cardiovascular side effects, such as increasing blood pressure and heart rate, which can affect normal cardiac function (causal pathway).

We conduct a simple cohort study that examines the relationship between average daily coffee consumption and the risk of having a heart attack in a group of 1,000 university lecturers. The results show that people who drink more coffee are indeed at significantly greater risk of having a heart attack. Based on these findings, we will recommend that people should stop drinking coffee and the risk of heart attacks in the heavy coffee drinking group should drop to the same level as the no coffee group, right?

Not so fast…. if you were asked to describe the behaviour and lifestyle of a person who drinks eight cups of coffee per day, what comes to mind? You would probably describe this person as someone with other lifestyle issues such as a high-pressure job, lack of sleep, poor diet, limited exercise or health problems. Any one of these factors could potentially increase stress on the heart and predispose individuals to having a heart attack. So, if we develop a public health campaign that aims to reduce coffee consumption, there is a good chance that it will not have the desired effect. All the heavy coffee drinkers we are targeting will still have all of those other lifestyle factors that are increasing their risk of a heart attack even if they were to reduce caffeine consumption to zero.

In epidemiology, we call this phenomenon confounding. This is when the apparent association between an exposure and an outcome can be at least partially explained by the presence of some other risk factor for disease that occurs more frequently in the exposed group than the unexposed group. In this case, heavy coffee drinkers (exposed group) are more likely to have stressful lifestyles (confounder) than people who do not drink coffee (unexposed group), which consequently increases their risk of having a heart attack (outcome).

Most epidemiological studies collect information on many different variables with the potential to confound the relationship between the exposure of interest and the disease outcome. For example, we may have also asked subjects in our coffee consumption study about their gender, age, height, eye colour, income and educational level as well as their general lifestyle.  We can use this information to “adjust” our estimates of the strength of the relationship between the exposure and outcome to get something closer to the true value.

 

Criteria for confounding

To determine whether we need to control for these variables as confounders, there are three important criteria that must be met:

This means that individuals with the confounder are significantly more likely (if the confounder is a risk factor) or significantly less likely (if the confounder is a protective factor) to have the disease outcome. In our coffee consumption example, we might consider age as a confounder because older people are more likely to have heart attacks than younger people. We would not consider eye colour to be a confounder because there is no biologically plausible reason why someone with green eyes would have a greater risk of heart attacks than someone with blue eyes or brown eyes.

This means that individuals with the exposure are significantly more likely to have the confounder than unexposed individuals, but the exposure does not directly cause the individual to have the confounder. This is probably one of the more difficult concepts to grasp. The easiest thing to do is ask yourself: ‘If an individual in the exposed group suddenly became unexposed, would that automatically change whether they had the confounder?’ For example, people who are heavy coffee drinkers are more likely to be working-age adults than older retired adults. However, a working-age adult that suddenly stops drinking coffee today is still going to be a working-age adult tomorrow, and an older retired adult that stops drinking coffee today is still going to be an older retired adult tomorrow. Drinking coffee (exposure) has no causal effect on age (confounder). We use a double-headed arrow in the diagram to represent a non-causal relationship.

As a word of caution, sometimes variables can be confounders even when there is no biologically plausible reason why exposed individuals should be more likely to have the confounder. For example, we may not consider height to be a confounder, even though it is associated with heart attack risk because there is no logical reason why tall people would be more likely to drink coffee than short people (assuming here that we are just studying fully-grown adults). However, it could be just random chance or some kind of selection bias that resulted in heavy coffee drinkers being taller on average than people who do not drink coffee. The only way we would know this confounding relationship existed is if we checked for statistical associations between the exposure and all other measured variables.

This means that the reason why the exposure increases the risk of the disease outcome must be different than the reason why the confounder increases the disease outcome. Let us say we were looking instead at the relationship between education levels and the risk of heart attacks in adults. We may hypothesize that more educated adults are less likely to have heart attacks than uneducated adults because they have more knowledge/ability to control risk factors. We know that education level is strongly associated with income, but we would not consider income to be a confounding factor because adults with higher income probably also have more knowledge/ability to control risk factors. These two variables (education level and income) are essentially measuring the same thing (knowledge/ability to control risk factors).

Putting everything together, if we can produce a diagram that looks like this, then we should consider the variable as a confounder in the study.

So how do we now determine whether coffee consumption truly has an effect on heart attack risk or whether the apparent relationship is simply due to those other confounding lifestyle factors? Ideally, we need to re-measure the association between coffee consumption and heart attacks in a group of individuals with the exact same lifestyles (that is, the prevalence of the confounding variable is the same for both the exposed and unexposed groups).

Controlling for confounding

If we have the ability to control the selection of subjects for our study at the design stage, there are three main methods we can use to ensure that individuals in the exposed group have similar lifestyles to individuals in the unexposed group:

We could restrict the study sample so that we selected only individuals with a high stress lifestyle. Or, alternatively, we could select only individuals with a low stress lifestyle to study. It does not really matter as long as the exposed individuals have the exact same level of stress as the unexposed individuals. Any relationship we then see between coffee consumption and heart attack risk is more likely to be caused by the cardiovascular effects of caffeine.

If we selected one heavy coffee drinker with a high stress lifestyle, then we would select one non-coffee drinker with a high stress lifestyle. Likewise, if we selected one heavy coffee drinker with a low stress lifestyle, then we would select one non-coffee drinker with a low stress lifestyle. This should mean that the exposed and unexposed groups have an equal distribution of the confounder (half of the heavy coffee drinking group will have the confounded and half of the non-coffee drinking group will have the confounder). Note: In a case-control study, matching will not remove the effect if confounding unless we subsequently use a special type of statistical analysis (conditional logistic regression), which is beyond the scope of this course.

If we were able to design a prospective intervention study, then we could randomly assign subjects into the heavy coffee consumption (exposed) and no coffee consumption (unexposed) groups. Again, we hope to get roughly the same proportion of individuals with a high stress or low stress lifestyle in each group. The main advantage to randomization is that is also likely to produce groups which have a similar distribution of variables which may also be acting as confounders but were not anticipated or known when the study was designed.

If we cannot control the selection of subjects (which is often the case in epidemiology), the next best thing we can do is to collect data on the confounder. This allows us to control for confounding in two ways when it comes to analysing the data:

We can divide the subjects into two groups: high stress lifestyle and low stress lifestyle. We re-calculate the risk ratio within each strata and then re-combine the results to get a weighted average value for the risk ratio describing the relationship between coffee consumption and heart attack risk. When you see a study in the literature reporting the ‘adjusted’ risk ratio or odds ratio, it probably means that they have used stratification to control for confounding.

There are other advanced statistical methods where you basically create a mathematical equation using the study variables to predict heart attack risk (for example, heart attack risk = (0.3 × coffee consumption) + (0.15 × lifestyle)). As long as the confounder is included as a variable in a multivariate model, any estimates you get for the risk ratio describing the relationship between coffee consumption and heart attack risk will have already been controlled for confounding due to lifestyle. When you see a study in the literature reporting the risk ratio or odds ratio from a multivariate model, it probably means that they have controlled for confounding.

One final note — just because a factor meets each of the three criteria for confounding it is not automatically a confounder in that particular study. As a final check, we need to see if the risk ratio changes after controlling for the confounder. If it is a true confounder that increases the risk of the outcome, we would expect the risk ratio or odds ratio for the exposure variable to decrease in magnitude after adjustment. So, if our study originally found that high coffee consumption increased the risk of heart attacks to be a factor of 2.4 (RR = 2.4, 95% CI: 1.8 to 3.0), we might expect the risk ratio to decrease in value to something like 1.1 (RR = 1.1, 95% CI: 0.8 to 1.4) after adjusting for lifestyle factors. Conversely, if it is a true confounder that decreases the risk of the outcome, we would expect the risk ratio for the exposure variable to increase in magnitude after adjustment. If the suspected confounder actually has no effect on the relationship between the exposure and the outcome, then we would not expect to see any change in the risk ratio after adjusting for the confounder.

Previous

5. Designing Research Studies