News : Introducing the Reference Data Management Framework
One of the ways that the Integrated Data Service (IDS) will allow accredited analysts and researchers to work at pace, is through how it links anonymised data together.
The Reference Data Management Framework, often referred to as the RDMF, is the framework that the IDS is using to improve the way in which data can be linked for analysis purposes. You may already be familiar with it, as the RDMF features as part of the Office for National Statistics’s (ONS) data strategy (opens in a new tab)
It is a trusted solution, that is scalable, and that works in practice.
The need
Decision makers want the best insight. It will be no surprise that a timely and complete picture of knowledge is needed to take any decision. Often analysts and researchers must piece this together from various sources. Historically, it is not always easy, and it takes time, particularly when considering complex issues or doing so at scale.
As a next generation research environment, the IDS provides the tools and technologies to enable users to access and integrate data from across government and the wider research community. This helps to produce new and untapped insight.
The RDMF will allow analysts and researchers to manage data securely within the IDS, whilst also enabling a more efficient approach to linkage. Data linking is also sometimes known as data matching, it is a technique that combines datasets to enrich the information that they contain. Read more about data linking on the ONS website (opens in a new tab)
The framework is designed to change the way that anonymised data about our society, environment and the economy are made available. This is so that our community of IDS analysts and researchers can easily expand datasets analytical potential with reference data.
RDMF in practice
The RDMF is an ambitious framework and is free to use for IDS users, across projects where there is a need.
The framework is made up of component parts of index data, which can be matched through a common identifier. This results in it being:
- quick
- consistent with methods
- scalable
- of the level of quality
- futureproofed, using automation in time
There are core indexes, typically:
- Business Index
- Demographic Index
- Address Index
Having these will reduce the amount of data linkage performed on a case-by-case basis, as much of the "heavier" linkage work is already complete. By indexing and matching individual datasets, it allows for easy and flexible linking at the point of use.
The benefits
The main benefit of matching central indexes, rather than directly between datasets, is that this is the most efficient and sustainable way to work.
It allows:
- consistency in the same approach to data integration, and something that potentially all government departments can aspire to through the IDS
- matching algorithm development
- quality reporting to assist quality decision making
- data matching just once, rather than multiple times
Anonymisation is mandatory
The IDS community of users can trust the reliable de-identification of datasets.
To guard against the risk of being able to reidentify people after linkage, the RDMF restricts access and applies special changes to the data, known as “disclosure control”. This means that anonymisation cannot be unpicked, but it still allows for meaningful analyses.
Watch this space!
Improving the process, by applying a trusted framework, means that a lot of the time-consuming preparation work has already been done. Allowing researchers and analyst to analyse at pace.
We will be telling you more over the coming months.