Table of Contents

Phase 5: Analysis

Phase 5 is about converting the data gathered from your research study into actionable insights that can advance understanding of the field and inform improvements in policy, practice, or outcomes. This phase focuses on making sense of the data in a way that is transparent, rigorous, and meaningful for the intended audience.

The key activities in this phase include:

  1. Preparing and cleaning the data for analysis, including checking data quality, completeness, and consistency.
  2. Analysing and presenting findings using appropriate methods, such as descriptive statistics, inferential statistical analyses, qualitative coding and thematic analysis, or mixed-methods integration.
  3. Critically evaluating the findings in light of the research question, study design, data limitations, and existing literature, and considering the strength, reliability, and practical significance of the results.

Step 1: Preparing and Cleaning Data

Quantitative Data

Where possible, start with raw data files and perform all cleaning and processing steps within a statistical software package. This approach creates a clear, auditable record of every transformation applied to the data, reduces the risk of manual errors, and improves reproducibility. Avoid making changes directly in spreadsheets unless absolutely necessary.

Preserve original data files
Begin by saving original copies of all raw datasets in a secure, read-only folder that will not be modified. These files should remain untouched so that you always have a definitive reference point if errors are introduced during processing or if analyses need to be repeated.

Import and familiarise yourself with the data
Import the dataset into your statistical software and take time to understand its structure. Check variable names, formats, coding schemes, and the total number of observations.

Run descriptive statistics on all variables
Generate basic descriptive statistics for every variable to identify unexpected values, implausible ranges, or inconsistencies. This step helps ensure that the data match what you expect based on the study design and data collection tools.

During this process, you may need to:

  • Recode free-text responses into structured categories where appropriate.
  • Combine or redefine categories for categorical variables with very low response counts to improve analytical stability.
  • Convert continuous variables into categorical variables if this aligns with the research question or planned analyses.
  • Decide how to handle missing data, including whether values should be left as missing, imputed, or trigger exclusion from specific analyses.

Address incomplete or partial records
For survey data in particular, establish clear criteria for excluding observations based on partial or incomplete responses. These rules should be defined a priori where possible and applied consistently across the dataset.

Document sample derivation
It is good practice to document how the final analytical dataset was derived from the target population. Flow diagrams are particularly useful for showing inclusion and exclusion steps, response rates, and reasons for data loss, and can later be adapted for publication or reporting.

Qualitative Data

Where possible, begin analysis using verbatim transcripts that are directly derived from the original audio recordings, and perform all cleaning, preparation, and organisation steps in a systematic and documented way. This approach preserves the integrity of participants’ accounts, creates a transparent audit trail of decisions made during data preparation, and supports rigour and reproducibility in qualitative analysis. Avoid ad hoc editing of transcripts or undocumented changes outside your agreed workflow, as these can introduce bias and undermine analytic credibility.

Transcription accuracy and completeness

Ensure that transcripts accurately and fully reflect the original recordings before analysis begins. This includes checking transcripts against audio files, confirming that all interviews or focus groups have been transcribed, and verifying that transcripts correspond to the correct participants or sessions.

Anonymisation and ethical compliance

Review transcripts to remove or replace identifying information such as names, locations, organisations, or specific events. Apply consistent participant IDs or pseudonyms, and store any re-identification keys separately and securely in line with ethics approvals.

Standardisation and formatting

Apply consistent formatting conventions across all transcripts, including speaker labels, timestamps, paragraphing, punctuation, and notation for pauses or non-verbal cues. Standardisation improves readability and supports systematic coding.

Handling ambiguity and analytic scope

Identify and manage inaudible, ambiguous, or non-analytic sections in a consistent and transparent way. Inaudible segments should be clearly marked rather than inferred. Decisions about retaining or excluding interviewer prompts, interruptions, or off-topic discussion should be made a priori and documented.

Data structure, security, and version control

Ensure that transcript text is clearly separated from metadata such as interviewer notes or demographic information, or explicitly distinguished within analysis software. Store files securely, apply version control, and ensure that edits are traceable and reversible.

Readiness for analysis and documentation

Confirm that transcripts import correctly into qualitative analysis software and that participant identifiers and case attributes align with the planned analytic framework. Maintain a clear audit trail documenting cleaning decisions, conventions, and exclusions to support transparency and reporting.

Step 2: Analysing and Presenting Findings

This step focuses on transforming cleaned data into findings that directly address the research question. Analysis should be planned, systematic, and transparent, with clear links between the research aims, analytical choices, and interpretation of results. While quantitative and qualitative analyses use different techniques, both require careful documentation, critical reflection, and attention to validity and rigour.

Quantitative Data

Describing the study sample

Quantitative analysis typically begins with a clear description of the study population. This provides essential context for interpreting all subsequent findings and allows readers to assess who the results apply to. Common characteristics to summarise include sample size, age distribution, sex or gender, geographic location, and other variables relevant to the research question, such as occupation, production system, disease status, or socioeconomic indicators.

Where possible, these characteristics should be compared with those of the target or source population using external data such as census statistics, administrative datasets, industry reports, or registries. Comparing the study sample with the broader population helps assess representativeness and identify potential selection or response biases that may influence interpretation.

Basic descriptive statistics

Descriptive statistics are used to summarise and explore individual variables before formal hypothesis testing. The choice of summary measures depends on the type of data and its distribution.

Categorical variables are typically summarised using counts and proportions, often presented in tables or bar charts. Continuous variables are summarised using measures of central tendency, such as means or medians, alongside measures of variability such as standard deviations, interquartile ranges, or ranges. Graphical displays including histograms, boxplots, or density plots are useful for visualising distributions and identifying skewness.

Descriptive analysis provides an important opportunity to identify outliers, implausible values, missing data patterns, or unexpected trends. These findings may inform decisions about data transformation, variable categorisation, or the choice of subsequent statistical methods.

Statistical analysis

After descriptive exploration, formal statistical analyses are used to address the research question directly. Analytical methods should align with the study design, data structure, outcome type, and assumptions of the statistical models used. Where possible, planned analyses should be specified a priori to reduce the risk of data-driven inference.

Common analytical approaches include:

  • comparisons between groups using appropriate tests, such as t-tests, non-parametric alternatives, or chi-square tests
  • regression modelling to examine associations between exposures and outcomes while adjusting for potential confounders
  • longitudinal or repeated-measures analyses for data collected over time
  • survival or time-to-event analyses when outcomes relate to duration or timing

 

Model assumptions should be checked and reported, including linearity, normality, independence, and homoscedasticity where relevant. Diagnostic plots and formal tests can help assess whether assumptions are met. Sensitivity analyses are often valuable for evaluating the robustness of findings to alternative modelling choices, variable definitions, or inclusion criteria.

Quantify effect size and uncertainty

After statistical models have been fitted and assumptions checked, results should be summarised using effect estimates and measures of uncertainty. Effect sizes describe the direction and magnitude of associations or differences observed in the data, while measures of uncertainty indicate the precision of these estimates.

Effect measures should be chosen to match the outcome and study design, such as mean differences, risk ratios, odds ratios, rate ratios, regression coefficients, or predicted probabilities. Confidence intervals should be reported alongside point estimates to convey the range of values consistent with the data.

When multiple comparisons, outcomes, or models are examined, the potential for chance findings should be considered at this stage. This may involve adjustment for multiple testing, prioritisation of pre-specified primary outcomes, or cautious interpretation of secondary and exploratory analyses. Effect estimates should be interpreted in light of this broader analytic context, with attention to consistency of findings across related analyses rather than isolated statistically significant results.

Interpretation should focus on the size and practical relevance of effects rather than statistical significance alone. Where appropriate, effect estimates should be interpreted in relation to clinically, biologically, or policy-relevant thresholds. Uncertain or imprecise estimates should be acknowledged explicitly, particularly when confidence intervals include values consistent with both meaningful effects and little or no effect.

Document analytic decisions

Throughout the analysis process, analytic decisions should be recorded in a clear and systematic way. This includes documenting data exclusions, variable construction, transformations, model specifications, assumption checks, and sensitivity analyses.

Where possible, all analyses should be conducted using scripted workflows within statistical software, with scripts retained and archived alongside the dataset. Documentation should be sufficient to allow another researcher to understand the sequence of analytical steps and, where appropriate, reproduce the results. Clear documentation supports accurate reporting, peer review, and future re-analysis, and strengthens the credibility of the study’s findings.

Qualitative Data

Qualitative analysis is an iterative, reflexive process that seeks to identify patterns of meaning within rich, contextual data. While specific approaches vary, the phases below reflect a commonly used thematic analysis framework and can be adapted to other qualitative methodologies.

Phase 1. Familiarisation with the data

Analysis begins with immersion in the dataset. Researchers read and re-read transcripts, field notes, or documents to gain an overall sense of the content and context. For audio-recorded data, familiarisation often begins during transcription. Initial ideas, recurring issues, and points of interest are noted through informal memo-writing. This phase supports sensitivity to nuance and context in later coding.

Phase 2. Generating initial codes

The dataset is systematically examined to identify meaningful features relevant to the research question. These features are labelled with codes, which may capture explicit content or underlying concepts. Coding may be inductive, deductive, or a combination of both, depending on the study aims and theoretical framework. Coding is typically applied across the entire dataset to ensure comprehensive coverage.

Phase 3. Searching for themes

Codes are examined and organised into potential themes that represent broader patterns of meaning. This phase involves grouping related codes, identifying candidate themes and subthemes, and exploring relationships between them. Visual tools such as tables, concept maps, or thematic diagrams can help structure thinking and support comparison across cases.

Phase 4. Reviewing themes

Candidate themes are reviewed and refined to ensure they are coherent, distinct, and well supported by the data. This involves checking whether coded data within each theme form a meaningful pattern and whether the thematic structure accurately reflects the dataset as a whole. Themes may be merged, split, redefined, or discarded during this phase.

Phase 5. Defining and naming themes

Once a stable thematic structure is established, each theme is clearly defined and named. This phase focuses on articulating the core meaning of each theme, what makes it distinct, how it relates to the research question, and how themes relate to one another. Detailed theme descriptions are developed to support interpretation and reporting.

Phase 6. Producing the analytic narrative

The final phase involves integrating themes into a coherent analytic narrative supported by illustrative data extracts. Findings are interpreted in relation to the research question, existing literature, and study context. Implications for practice, policy, or future research are identified, and the limits of interpretation are acknowledged.

Where feasible, it is good practice for more than one person to review the analysis workflow and interpretation of results. A second reviewer can help identify coding errors, inappropriate analytic choices, or over-interpretation of findings, and can provide a valuable check on whether conclusions are adequately supported by the data. This review may be informal, such as an internal team discussion or code walk-through, or more structured, such as independent replication of key analyses. Incorporating multiple perspectives strengthens analytic rigour, improves clarity of reporting, and reduces the risk that errors or unchecked assumptions influence the final conclusions.

Mixed-Methods Studies

For studies using both quantitative and qualitative data, analysis should not occur in parallel silos. Integration can occur at multiple points, including:

  • using qualitative findings to explain or contextualise quantitative results
  • examining convergence or divergence between datasets
  • developing joint displays that align numerical results with themes, narratives, or illustrative quotations

Explicitly describing how integration occurred strengthens interpretive depth and helps readers understand how different forms of evidence contribute to the overall conclusions.

Step 3: Critically Evaluating the Findings

The final step in analysis involves stepping back from the results to evaluate what they mean, how much confidence can be placed in them, and how they should be used. This phase moves beyond statistical or thematic outputs to interpretation and synthesis, integrating the study’s findings with the research question, study design, and existing evidence base. In practice, it corresponds to writing the discussion and conclusion sections of a manuscript, where results are interpreted in light of prior literature and their broader implications are considered.

Interpretation in context

Findings should be interpreted in relation to the research question, study design, and existing evidence, with explicit consideration of how much confidence can reasonably be placed in the results. This includes reflecting on sample size, precision of estimates, consistency across analyses, and alignment with findings from prior studies, as well as whether the study design supports causal inference or only descriptive or associative conclusions.

Interpretation should balance what the data suggest with the limits of what the study can support. Uncertainty should be acknowledged explicitly, and conclusions framed in a way that avoids over-interpretation while remaining informative and relevant.

Internal validity and sources of bias

Critically assess the extent to which the findings are likely to reflect true effects rather than artefacts of bias, error, or uncontrolled variability. Key considerations include:

  • selection bias and the representativeness of the study sample
  • measurement error or misclassification of exposures, outcomes, or covariates
  • residual confounding in observational analyses
  • missing data, attrition, or loss to follow-up

 

Rather than simply listing potential sources of bias, evaluation should consider their likely direction and magnitude, and how they may influence interpretation. This allows readers to judge the robustness of the conclusions and the conditions under which they are most credible.

External validity and applicability

Consider the extent to which the findings are applicable beyond the specific study setting. This includes reflecting on differences between the study population and the populations of interest, as well as contextual factors such as geography, systems, resources, cultural norms, or regulatory environments.

For intervention or programmatic studies, consider whether the intervention is feasible, acceptable, or scalable in other settings. For observational studies, be clear about the populations to which the findings are most likely to generalise, and where caution is warranted.

Implications for practice, policy, and future research

The final component of critical evaluation is translating findings into implications that are appropriate to the strength of the evidence. This may include identifying practical or clinical relevance, policy or management implications, gaps revealed by the analysis, and methodological lessons for future studies.

Implications should be proportionate, clearly linked to the results, and transparent about uncertainty. Avoid overstating conclusions or recommending action beyond what the evidence can reasonably support.

When things didn't go as planned...

Null, weak, or contradictory findings should be treated as informative rather than as failures. Valuable insights can still emerge when results do not align with initial expectations, as such findings may highlight incorrect assumptions underlying the research question, limitations in study design or measurement, challenges in implementation or fidelity, or a true absence of effect in the studied context. Explicit reflection on what did not work, and why, contributes to cumulative knowledge, supports refinement of methods and theory, and represents a legitimate but often underreported component of high-quality research.

Evaluating Research Studies

Learn more about the research publication process and how to read journal articles with a critical eye if and how the key findings can be applied to your work.

Previous

4. Implementation

Next

6. Communication