Cookies on ons.gov.uk

Cookies are small files stored on your device when you visit a website. We use some essential cookies to make this website work.

We would like to set additional cookies to remember your settings and understand how you use the site. This helps us to improve our services.

You have accepted all additional cookies. You have rejected all additional cookies. You can change your cookie preferences at any time.

Skip to main content

Project methods : Overview

Overview

This list of methods and definitions was taken from the National Centre for Research Methods (opens in a new tab) 

Automated methods

Automated methods in data analysis include using scripts, algorithms and machine learning techniques to process and analyse data without manual intervention. This might include Reproducible Analytical Pipelines (RAP).

Data mining

Data mining techniques explore unique and valuable trends, patterns and relationships within datasets.

Data mining methods incorporate analytical techniques and can follow established, hypothesis-driven analysis. They can also follow data-driven approaches where patterns are discovered.

Examples of data mining methods are:

  • data fusion
  • neural networks
  • machine learning

Data fusion

The process of combining and integrating data from multiple sources to create a more comprehensive representation of the underlying information.

Data fusion involves merging records that represent different entities. This is different to data linkage, where the aim is to link records from different sources that relate to the same individual.

Neural networks

Neural networks are a form of machine learning that can model complex nonlinear relationships through simulating the structure of the brain.

These models learn from data by adjusting weights of connections, enhancing their ability to make predictions.

Machine learning

Development of algorithms that can learn from data to identify patterns and relationships to make predictions and solve problems.

Descriptive statistics

Descriptive statistics summarise and describe the characteristics of data. They provide information on the spread of scores (variance) as well as the midpoint (measure of central tendency).

Descriptive statistics include:

  • correlational analysis
  • effect size
  • levels of measurement
  • variance estimation

Correlational analysis

Measures the strength and direction of the relationship between two variables, determining positive or negative associations.

Effect size

Measures of effect size communicate the size of a relationship or difference between variables.

Levels of measurement

This relates to how each variable is measured referring to different types of data, such as:

  • nominal
  • ordinal
  • interval
  • ratio

Variance estimation

Estimates of variance measure the variability and spread of data. This includes measurements such as confidence intervals.

Econometrics

Application of statistical methods to estimate and measure the impact of various factors on economic outcomes.

Techniques can include:

  • regression analysis
  • time series analysis

These examine relationships between variables and make predictions about economic behaviour.

Event history analysis

Event History Analysis is a statistical method used to analyse occurrence of events over time, concerned with understanding the occurrence of specific events.

The approach includes:

  • hazard analysis
  • survival analysis
  • duration analysis
  • Poisson analysis

Hazard analysis

Focuses on estimating the instantaneous rate at which events occur over time.

Survival analysis

Concerned with studying the time until an event of interest occurs.

Duration analysis

Also known as “time-to-event" analysis. Examines the time it takes for an event to occur.

Poisson analysis

Focuses on the occurrence of events that happen at a specific rate over time.

Latent variable models

A latent variable cannot be directly observed, with their presence inferred due to existing observed variables. Latent variable models are statistical frameworks used to analyse the underlying factors, or latent variables, that influence observed variables.

There are different types of latent variable models, such as:

  • graphical modelling
  • latent class analysis
  • latent profile analysis
  • latent trait analysis
  • principle components analysis
  • factor analysis
  • confirmatory factor analysis
  • structural equation models
  • rash models
  • item response theory
  • correspondence analysis
  • cluster analysis

Longitudinal Data Analysis

A statistical method used to analyse data that has been collected on the same individuals over an extended period. The approach allows for understanding how variables change over time, or because of applied interventions, for example.

There are various approaches to Longitudinal Data Analysis, which include:

  • panel data models
  • Arrelano-Bond estimation
  • cross-lagged panel models
  • growth curve models
  • growth mixture models
  • latent class growth analysis

Microdata methods

Microdata is data at unit level, directly collected from a specific unit of observation.

Methods involving microdata are:

  • data linkage
  • micro-economics

Data linkage

The process of matching records relating to the same individual from multiple datasets to form a new dataset.

Micro-economics

To understand economic behaviour, micro-economics relies on micro level data to analyse economic decision making at the individual level.

Multilevel models

Statistical approaches used to analyse data with a nested or hierarchical structure. It allows for modelling when data exhibits dependencies or correlations at differing levels of aggregation.

Examples of multilevel models are:

  • hierarchical models
  • mixed models
  • random effects

Hierarchical models

Account for nested structure of data. Allow for estimation of within and between group variability.

Mixed models

Incorporate fixed and random effects. Fixed effects capture the average relationships between variables across all levels. Random effects identify the variability specific to each group or level.

Random effects

Allows for identification of group level effects.

Non-parametric approaches

Methods that do not make assumptions regarding the data characteristics or parameters, or whether the data is quantitative or qualitative in nature.

Nonparametric maximum likelihood (NPML)

Useful when dealing with complex or irregular data distributions and allows for flexibility in modelling. NPML aims to find the set of parameters that maximises the likelihood function without making assumptions about the distribution.

Q methodology

A method that analyses subjective viewpoints and attitudes and combines both qualitative and quantitative techniques.

Qualitative methods measure subjective opinions, whilst the quantitative method applies a factor analysis to explore any correlation between the qualitative results.

Regression models

Used to explore the relationship between dependent and independent variables, analysing how changes in the independent variables influence the dependent variable.

Regression models are used to make predictions and for estimating the strength and significance of these relationships.

There are several variations suited for different data types and research questions:

  • Ordinary Least Squares (OLS)
  • Generalised linear model (GLM)
  • Generalised least squares (GLS)
  • Analysis of variance (ANOVA)
  • Analysis of covariance (ANCOVA)
  • linear regression
  • log-linear regression
  • logistic regression
  • probit regression
  • discrete choice or count models
  • instrumental variables estimation
  • heteroskedasticity
  • Poisson regression
  • Rasch modelling
  • additive intensity model
  • ordinal regression
  • regression discontinuity
  • categorical data analysis

Simulation

Data simulation is the process of generating artificial data representative of the characteristics of real-world data, incorporating pre-defined assumptions and parameters.

Simulation allows for the investigation of different scenarios and evaluation into the influence of factors on outcomes of interest.

Small area estimation

A method for estimating population characteristics within smaller geographical areas. Methods include integrating multiple data sources, for example survey data with additional information such as census or administrative data.

This approach provides more accurate and reliable estimates for smaller geographical areas where direct sample sizes may be limited.

Spatial data analysis

Aims to provide location-based insights through the examination and interpretation of data that possess geographic or spatial characteristics. Spatial data analysis allows for the identification of patterns, emerging trends, and relationships that are specific to the spatial characteristics of the data.

The method provides understanding of the relationship between geographic factors and the phenomena being examined.

Statistical theory and methods of inference

Statistical theory provides the theoretical foundation for understanding the uncertainty, patterns, relationships and trends within the data.

Methods of inference refers to the practical application of these statistical techniques in drawing conclusions based on observed data.

Time series analysis

Time series analysis enables the examination and modelling of data that has been collected over time. Such analysis focuses on understanding patterns and trends present in sequential data points, allowing the analyst to:

  • make predictions
  • identify patterns
  • uncover relationships between variables.

There are assumptions and concerns, such as seasonality and stationarity, that the analyst would need to address when running such analysis.

Time series analysis includes:

  • forecasting
  • space-time path

Forecasting

Allows for making predictions about future values based on past observations.

Space-time path

Discovers whether the data changes over both time and space.

Other

Other methods include:

  • Boolean algebra
  • dynamic models
  • formal logic
  • generalised method of moments (GMM)
  • propensity score matching
  • sensitivity analysis
  • simultaneous equation models